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Things I Have Written About 
Elsewhere #20090112 



Living in the Donut Hole 



Living in the Donut Hole 



This post was originally published on the code.flickr.com 

Weblog http://code.flickr.com/blog/2009/01/12/living-in-the-donut- 

hoie/ in January, 2009. 



A long long time ago 

(2005 http: 111 lickr .com/places/Canada/British+Columbia/Vancouver#2005 ) 

in a galaxy far far way 

(Vancouver http://flickr.com/places/Canada/British+Columbia/Vancouver#f: 

when we joined Yahoo! and moved 

FlickrHQ http://flickr.com/groups/flickrheadquarters/pool/ to the Bay 

Area all but one or two members of the team lived within ten square blocks of each 
other in San Francisco's Mission 

District http: //www.f lickr.com/places /United+States /Calif or ia/San+Francis 

John Allspaw http://www.flickr.com/photos/allspaw , a long-time 
resident of the 

Mission http: lit lickr. com/places /United+States/Calif or nia/San+Francisco/ 

used to regale us with stories of one of the neighbourhood's notable quirks 
commonly referred to as the "donut hole": The rest of the city could be covered in 
fog, or raining, but the moment you crossed over in the Mission the sky would open 
up and the entire neighbourhood would be bathed in sunshine. 

When John and George OateS http://www.flickr.com/photos/george 

and I used to car pool between the city and the offices in 

Sunnyvale http: 111 lickr. com/places/United+States/California/Sunnyvale#f : 
we would drive up and down highway 280 http://www.fiickr.com/search/? 
q=highway%20280&w=aii and sure enough as you approached the city, at the end 
of the day, you would drive into an enormous blanket of fog the moment we passed 
the airport http://www.fiickr.com/piaces/SFO in Millbrae. And as soonas 
we'd pulled off the San Jose exit there would be an open stretch of clear sky all the 
way to Civic 

Center http: 111 lickr .com/ places /United+States/Calif or nia/San+Franc is co/C 

where it would stop again just as suddenly. 



Some mornings, when I look out my kitchen window at the clouds hanging 
over Diamond 

Heights http: //www.f lickr.com/places /United+States /Calif or ia/San+Francis 

I like to pretend I can see the curvature of the inside of the donut hole itself. I was 
reminded of all this the other morning when I was generating some visualizations 
based on the shapefiles that are derived from the almost 100 million geotagged 
photos on Flickr http://code.fiickr.com/biog/2008/10/30/the-shape-of- 

alpha/ 




http://www.flickr.com/photos/straup/31802 83176/ 



The larger, blue, contour is the "shape" of the city of 

Paris http: //www. flickr .com/places/France/AZle-de-France/Paris (or 

WOE ID 615702 http://www.fiickr.com/piaces/6i5702 ) according to Flickr. 
The smaller white contours are the child neighbourhoods of that WOE ID with 



public, geotagged photos. So, what's going on then? 

The first outline maps roughly to the extremities of the 

RER http: 111 lickr.com/photos/tags/rer/clusters/paris-france-train/ , 

the communter train that services Paris and the surrounding suburbs. This is a fairly 
accurate representation of the "greater metropolitain" area of Paris. Metropolitain 
areas, increasingly common in both popular folklore and government administrivia 
as more and more people shift from rural to urban living 

http://longnow.chubbo.net/salt-apr02 005-brand/salt-apr02 005- 

brand.mp3 , are noticeably lacking from the Flickr hierarchy of place types and a 
subject probably best left for another blog post. 




The rest, taken as a whole, follow closer to the shape of the old city 
gates http://en.wikipedia.org/wiki/City_gates_of_Paris that most people 
think of when asked to imagine Paris. Which one is right? Well, both obviously! 



Cities long ago stopped being defined by the walls that surround(ed) them. 
There is probably no better place in the world to see this than 
Barcelona http://www.flickr.com/map? 

S,fLat=41.3959&fLon=2.1749&zl=6&map_type=sat which first burst OUt of its Old 
City http: //www. flickr.com/places/Spain/Catalonia/Barcelona/Ciutat%2 0Vel 

with the construction of the 

Eixample http: 111 lickr.com/places/Spain/Catalunya/Barcelona/l%2 7Examplej 

at the end of the 19th century and then again, after the wars, pushed further out 



towards the 

hills http: //www. flickr.com/places/Spain/Catalonia/Barcelona/Gracia/ 

and 

rivers http: //www. flickr.com/places/Espa%C3%Bla/Catalu%C3%Bla/Sant+Adri%C 

that surround it. 

There are lots of reasons to criticize urban sprawl as a phenomenon but 
sprawl, too, is still made of 

people http: //www. slideshare.net/georgeO 8 /human-traffic-general-public- 
presentation who over time inherit, share and shape the history and geography 
they live in. Whether it's Paris, Los Angeles, William Gibson's dystopic "Boston- 
Atlanta Metropolitain Axis http://en.wikipedia.org/wiki/The_Sprawl " 

(BAMA) or the San Francisco "Bay Area" they all encompass wildly different 
communities who, in spite of the grievances harboured towards one another, often 
feel as much of a connection to the larger whole as they do to whatever 
neighbourhood, suburb or village they spend their days and nights in. 




That's one reason I think it's so interesting to look at the shape of cities and 
see how they spill out beyond the boundaries of traditional maps and travel guides. 
In the example above the shape for Paris completely engulfs the commune of 

Orly http://www.flickr.com/map?&fLat=48.7403&fLon=2.4032Szl=5 ,20 

kilometers to the South of central Paris, which makes a certain amount of 

Sense http://en.wikipedia.org/wiki/Orly . 



It also contains Orly airport http://www.flickr.com/places/ORY 

which isn't that notable except that we treat airports as though they were cities in 
their own right because the realities of contemporary travel mean that 
airports http://www.flickr.com/photos/tags/airport/clusters have 
evolved from being simple gateways to captial-P 

places http://www.fiickr.com/piaces with their own culture, norms and 
gravity. So, now you have cities contained within cities which most people would 
tell you are just neighbourhoods. 

We're recently finished rendering the second batch of 
Shapefiles http: //delicious. com/ tag/f lickr+shapefiles and looking ahead I 
am wondering whether we should also be rendering shapes based on the 
relationship of one place to another. Rendering the shape of the child places for a 
city or a country (you can do this using the handy, if awkwardly named, 
flickr.placeS.getChildrenWithPhotOsPublic http : / /www . f lickr . com/services /a 
API method) would allow you to see a city's "center" but also provide a way to 
filter out parts of a shape with low Earthiness (aka water). 



http: //www. flickr.com/photos/straup/3187717383/ 

The issue is not to prevent, or correct, shapes that provide a "false" view 
because I don't think they do. As Schuyler http://iconocia.st/ observed, 
while we were getting all this stuff to work in the first place, and testing the 
neighbourhoods that meet San Francisco Bay they are really the shapes of people 



looking at the city. They are each different, but the same. 

But maybe we should also map the neighbourhoods that aren 't considered 
the immediate children of a city but which overlap its boundaries. What if you could 
call an API method to return the list or the shape of a place's "cousins"? What could 
that tell us about a place? 




http://www.flickr.com/photos/straup/3182 7 64164/ 



What does all of this have to do with 

donutS http: 111 lickr.com/places /United+States /Calif ornia/San+Francisco/K 

Nothing really, but it's a nice way to think about the problem and since we have a 
long and storied tradition of silly names for projects I imagine this one will stick 
too. 



There are no fixed dates yet for when, or whether, any of this will make its 
way in to the API but quite a lot of it could be done with API methods already 
available today. One change we have made is to add a new 
flickr.places.getShapeHistory API 

method http: //www.f lickr .com/services /api/ flickr.places.getShapeHistory 

which include pointers to all the shapefiles that have been rendered for a place. I 
have dim and distant memories of possible reasons why not to do this, in the past, 
but the exercise in making donut shapes makes me think I was wrong. The more 
data and "nubby bits" that people have to work with the more interesting it will be 
for everyone. 



Enjoy! 
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The Submergible Meat Zeppelin 
and Other London Stories 



PaperCamp, the nearby version 



PaperCamp, the "nearby" version 




There's actually quite a lot to write about the inaugural 

PaperCamp http://bookcamp.pbwiki.com/PaperCamp but I'm not going to 

do that yet. 

There is already a fine 

Collection http://delicious.com/tag/papercamp of write-ups including 

Jeremy's rock-opera recap of the day's 

events http://adactio.com/journal/154 6/ but I am going to hold off on 

saying too much until I finish the next iteration of the 

pOCketMMap http://www.aaronland.info/weblog/2008/ll/27/time/#hills 

books. It's one thing to have failed at getting two years worth of 
papernet http://www.aaroniand.info/papernet/ talks organized and 
printed in a handy magazine/book thingy twice in a row now but quite another to 
not have (new) working code to show for a presentation. Talk is cheap and since 
it's all still at the stage where we're forging nails to fill the toolbox with, so to 
speak, it seems wrong to keep talking in the absence of something to point at to 
be able to say "like that, even if it is a false start." 



As it happens, I spoke both at PaperCamp and the day before at the 



Guardian http://fiickr.com/photos/pauicarvii1/3201149037/ thanks to 
an invite from Simon 

Willison http://simonwiiiison.net/2008/Aug/22/empioyment/ so I figured 
I would simply post both sets of slides here with a short discussion around the 
larger motivation for each, the idea of creating "history 

boxes http://flickr.com/photos/straup/3040696271/ ", wrapping it all up 

in the bow-tie of a quiet little feature that's been percolating at 

Work http://flickr.com/photos/heather/3220545032/ . 




Unfortunately, the False Starts department intervened to delay that last bit 
yet again so it's just me talking words. Again. 



Which is to say that although people seemed to enjoy both talks I was very 
conscious of how close each came to entering "I'm So Fucking Awesome! " 
territory. If I didn't cross the line I'm pretty sure my shadow did. I suppose that's 
the risk and burden of speaking about anything you've worked on for years and 
years but it is still an unpleasant indicator of a lack of imagination when it 
happens. Hopefully, there is now a sign-post marking those 

boundaries http://placeography.org/index.php?title=Main_Page that 



will weather the years and continue to be visible long before I ever get there 
again. 

Which is to say that PaperCamp was otherwise fucking awesome. I am so 
excited by all the work people are doing and the ideas they are poking around. I 
remain very much convinced that it is too soon for anyone to bother trying to 
capital-U understand what it all means and it was lovely to see people rock- 
climbing the unknown looking for the proverbial foot-holds and orchids that 
usually escape the first pass. 

Matt 

Jones http: //magicalnihilism.wordpress .com/2009/01/19/papercamp- 

prototyped/ deserves unbridled praise, and a Just Fucking Do It 

award http://fiickr.com/photos/kevincoiiins/406215123/ ,for 
organizing the event and bringing everyone together and for always answering 
the "what can I do to help" question by saying: Organize your own event! 

If you're anywhere within train-shot of the New York area, I would 
encourage you to attend PaperCamp 

NY http://www.barcamp.org/Papercamp-NY-2009 which is being held in 
Cohoes http://flickr .com/places/United+States/New+York/Cohoes at the 

beginning of February. Josh DiMauro http : / /blog .metacarpal . net/ is one 
of the organizers and the inspriration for the watch and 

learn http: lit lickr.com/photos/jazzmasterson/sets/721576072 8581138 6/ 

slide (below) and if I hadn't just gotten back from London I would be heading to 
New York to do just that. Who knows, I've done crazier in the past... 

Included below are the slides from my presentation. There are no notes to 
speak of since this was still a time when I delivered talks ad lib. In retrospect that 
was ...a thing. 



Theory 

2006 



2007 



Everything I said I wouldn't do (or EDFG 2.0) 

Next Steps 

<you> :a "what you eat" . 

e(r)dfg-writer 

The Illusion of Easy 

The Papernet 



Writing on the Wall 
Talk Is Cheap 




Cry like a baby 



Maybe, finally? 
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PocketMMap 




tips=" http://dopplr.com/place/gb/london/tipa_feed ff 



from pocketMMap. Providers import Atom 
mm = Atom(8.5, 11) 
mm.draw_f eed( tips ) 



"buckets" ^ fetch 
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store format 



"buckets" ^ fetch 
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list map store format 
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"buckets" ^ fetch 
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list map store format 
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Things I Have Written About 
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Things I'm Standing Next To 



Things I'm Standing Next To 



This post was originally published on the code.flickr.com 

Weblog http://code.flickr.com/blog/2009/02/09/things-im-standing-next- 

to/ in February, 2009. 




~'>* : ';r, -■ 



' ■*%': •"■-• . : "'JA-' ■'."■■■ 



wowed by Exposure finding pictures on 
flickr of things i'm standing next to in 
Soho. an eery demo of the power of 
locative data. 

'.■1^. :":./. , ;/Ti:Ti. , |i'(|MrT}fr 



- :^ -' : /"It 



*h 



<-*/ ' 



**> 






;/* = 



"The problem with these geolocative services is that they assume you 're 
a precise, rational human, behaving as economists expect. No latitude 
for the unexpected; they're determined to replace every unnecessary 
human interaction with the helpful guide in your pocket. " 

— Tom Taylor http://scrapiab.net/2009/02/06/hereish-nowish.htmi 



Back in June Of 2008 http://tech.groups.yahoo.com/group/yws- 

f iickr/message/4146 we added the ability to perform radial queries in the 



photOS^earch API method http: //code, flickr.com/blog/2008/09/04/whos- 

on-f irst/ . One of the earliest developers to use this feature was Frasier 
Speirs http://code.fiickr.com/biog/2009/01/22/5-questions-for-fraser- 
speirs/ who used it for the "Near Me" feature of 

Darkslide http://speirs.org/2008/12/22/darksiide-15-post-mortem/ , his 
Flickr application for the iPhone originally named 

Exposure http://www.connectedflow.com/blog/?p=98 . 

It is difficult to overstate how impressive, and important, tools like Google 

Earth http://www.paulhagon.com/thenSnow and 

PhotoSynth http://eiectronicmuseum.org.uk/2009/01/31/crowd-sourcing- 
photosynth/ are visualizing geographies and in pushing the boundaries of what is 
possible both technically and conceptually. But both do so at the expense of what 
Scott McCloud calls "the magic in the 

gutter http : / /www . kr is j ordan . com/ 2008/09/09 /why-the-google-chrome- 
comic-rocked-scott-mcclouds-invisible-art/ ". On the subject of the gutter, 
the space between the individual panels in a comic book, McCloud 

writes http://www.scottmccloud.com/store/books/uc.html : 



"From an axe-murderer pursuing a frightened man in one panel to an 
ambiguous shriek in the next, what happened? You killed a character in 
your mind. The artist did nothing of the sort. Closure is the work done 
by a reader which takes two juxtaposed images and unifies them into a 
single idea." 



That's one of the things I like the most about "Near Me" and the ability to 
perform radial queries using the Flickr API: It affords a representation of place spun 
from a thicker, coarser, yarn. Less precise, maybe, but richer in its own way. With 
radial queries there is the knowledge that all the photos were taken close to one 
another — so close in some cases you can almost walk down a city block from one 
photo to the next — but the results are staggered http://vimeo.com/2721992 , 
often overlapping one another, indoors and outdoors in time and space. 



That staggering is the breathing 

room http://speedbird.wordpress.com/2008/05/04/the-long-here-and-the- 

big-now/ , the gutter, that lets viewers discover, imagine and create their own 
connections, their own closures, from not just a one history of a place but also the 
patterning http://magicainihiiism.wordpress.com/2008/10/11/bionic- 

noticing-on-irving-street/ of all those who've passed through it. 



"Whether mapping lost lakes of a different era or tracing the edges of 
disappeared lagoons that still haunt the streets of San Francisco — or 
reminding urbanites of the sport-fishing possibilities beneath Manhattan 
— we are alive within laminations we will never fully map or 
comprehend." 

— Geoff Manaugh http://bidgbiog.biogspot.com/2009/02/watermarks.htmi 



Earlier last week we enabled a quiet little feature that, hopefully, allows you 
to navigate some of that same mystery and serendipity in the 100 million geotagged 

photOS http://code.flickr.com/blog/2009/02/04/100000000-geotagged- 

photos-pius/ on Flickr. We call it "nearby" and it is available for any geotagged 
photo on the site. 

Nearby starts with a geotagged photo and then queries for other geotagged 
photos within a one kilometer radius. You can order the results by time and distance 
and interestingness but the important part is that they are photos, well, nearby to 
the photo you are looking 

at http: //www. nearfuturelaboratory.com/2009/02/09/locative-play/ . 

Nearby is a deliberately fuzzy concept. Nearby in St. Peter's Square in Rome might 
mean the person directly in front of you. Nearby in the streets of a small town might 
be the beautiful garden behind the fence and around the corner. Nearby encourages 
people to poke around and discover their surroundings, as though they were on foot 
and everything was just a short walk away. 

For example, I went to London recently and I while I was there the lovely 
crew at Dopplr http : / /www . doppir . com/ took me around the corner from their 



offices to the Whitecross Street Market http://fiickr.com/search/? 
q=whitecross+street+market for lunch. We oggled the cheese and I had a 
chorizo sandwich and we all took lots of pictures. This is a picture of my 
sandwich http://fiickr.com/photos/straup/32i5909i95 and these are the 
photos taken nearby, including one of me taking a picture of my sandwich: 
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I hove no idea what's up with the creepy smile . I blame... the jet-lag. 



If you look carefully at the screenshot above, you can see that I've defined 
"nearby" to mean only photos taken by my contacts on the same day and sorted by 
their distance from my delicious sandwich. This is a very specific and personal view 
and its value for me is as a record of a shared experience with friends. But if I strip 
away all the conditions and visit the default "nearby" 

page http://www.flickr.com/photos/straup/3215909195/nearby today I can 
see the snow that's recently fallen on London and the graffiti that I walked passed, 
but didn't photograph, in Shoreditch and Cal hanging out in the Moo/Dopplr 

Offices http://flickr.com/photos/bees/3260677920/nearby/? 
by=owner&taken=recent&sort=datetaken&contacts=l&page=l&show=detail . 



I can watch the place around my photo grow up while also nosing around in 

the ShoebOX Of its past http://www.flickr.com/photos/mathowie/3 063 3 0464/ 

and, at the same time, someone else can see that I ate a sandwich on Whitecross 

Street http:/ /www. flickr.com/map?&fLat=51. 5232 &fLon=-0 . 0929&zl=l . 

The default search criteria for nearby is all photos taken in the last month 
sorted by date taken (most recent, first) but there are lots of different ways to define 
what nearby looks like using the available filters: 

Who took a photo? (For the sake of brevity, "you" can also be taken to 
mean another photographer.) 

• Everyone 

• Everyone by you 

• Only you 

• Only your contacts 

When was it taken? 

• In the last month 

• Today, you know "today" 

• The day that the photo was taken 

• All time 

How should those photos be sorted? 

• Date taken 



• By distance from the center point 

• By interestingness 

More sophisticated filters, like explicit date ranges, aren't yet available but 
we're definitely thinking about them and we'd love to hear how people would like 
to use nearby. 

The links for nearby pages are currently only available from the modal 
"map" dialog on individual photo pages or by URL hacking which is really just 
fancy talk for adding /nearby to the URL of any (geotagged) photo page. 




Here's a picture of another sandwich, this time a Philly 

Cheesesteak http: //www.f lickr.com/photos/tags/cheesesteak/clusters/phila 

phiiiy-food/ , taken in San Francisco's Mission district: 



http://www.flickr.com/photos/straup/3199232007/ 



And here are nearby photos: 

http://www.flickr.com/photos/straup/3199232007/nearby 
Or a photo of the Sydney Cricket 

Ground http: //www. flickr.com/photos/powerhouse_museum/3 022 87 652 5/ , 

from the Powerhouse Museum's Collection, taken in 1900: 

http://www.flickr.com/photos/powerhouse_museum/3022876525/ 

And photos taken nearby, on the very same grounds, 100 years later: 

http://www.flickr.com/photos/powerhouse_museum/3022876525/nearby? 
taken=alltime&sort=distance 





Just like that! Well okay, we added a couple query parameters (to that last 
link) which most people aren't ever going to do but you get the idea. 



We've also added a special page, which you can link to with any old latitude 
and longitude and and we'll show you photos near that point. This isn't a page that 
we expect people to visit directly (typing all those numbers in the location bar is 
pretty boring, really) but rather we hope that it will be used by third- party 
applications and devices which are location aware and can fire up a web browser. 

For example, if I were wandering around the Metropolitan Museum of 

Art http://www.metmuseum.org/ , in New York City: 

http://www.flickr.com/nearby/40.779274,- 73.963265 

In the not so distant future, when web browsers are able to read your 

location http://dougt.wordpress.com/2008/O8/O8/geoiocation-today/ using 
a GPS signal or wifi triangulation then [REDACTED BY KITTENS] but for now this 
is just a little something to bridge the difference and a hook for people to use in 
their applications. 

In the Easter Egg department we've also added support for 
geohashes http://www.geohash.org/ to the special "lat, Ion" pages. The 

Wikipedia entry for geohashes http://en.wikipedia.org/wiki/Geohash 

describes them as "a hierarchical spatial data structure which subdivides space into 
buckets of grid shape . . . offering properties like arbitrary precision and the 
possibility of gradually removing characters from the end of the code to reduce its 
size (and gradually lose precision)." They're also just short(er). 
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When hashed, 40.779274,-73.963265 becomes dr5ruzts7p8k which is all 
kinds of weird but that is true of all URL shortening services. Anyway, if the length 
of your URLs is an issue and the extra 8 characters are really going to make a 
difference you can sacrifice some of the built-in semantics in the longer version and 
use the following to link to the same photos from the Met: 

http://www.flickr.com/nearby/dr5ruzts7p8k 



Finally, during the time its taken me to write this blog post I came across the 
SnarkMarket http : //snarkmarket . com/biog/ blog and a piece quoting Ed 
Folsom's account of Walt Whitman's experience of "urban 

affection" http: // snarkmarket.com/blog/snarkives/books_writing_such/per so 

which I think is a nice ribbon to wrap it all up with: 



Whitman feels the power of the city of strangers. He's looking at a city 
of strangers and how something we might now call urban affection 
begins to develop. How do you come to care for people that you have 
never seen before and that you may never see again? 

Every day we encounter people, eyes make contact, we brush by people, 
physically come into contact with them, and may never see them again. 



"If I were doing that activity that person would be me. If I were 
wandering the other way, rather than this way, that person could be me.' 
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Things I Have Written About 
Elsewhere #2009024 



An Abundant Present 



An Abundant Present 



This post was originally published on the flickr.com 
Weblog http://blog.flickr.net/en/2009/02/24/an-abundant-present/ in 
February, 2009. 






Recently we enabled a quiet little feature that, hopefully, allows users to 
navigate some of the mystery and serendipity in the 100 million geotagged photos 
on Flickr. We call it "nearby" and it is available for any geotagged photo on the site. 



Nearby starts with a geotagged photo and then queries for other geotagged 
photos within a one kilometer radius. You can order the results by time and distance 
and interestingness but the important part is that they are photos, well, nearby to the 
photo you are looking at. Nearby is a deliberately fuzzy concept. Nearby in St. 
Peter's Square in Rome might mean the person directly in front of you. Nearby in 
the streets of a small town might be the beautiful garden behind the fence and 
around the corner. Nearby encourages people to poke around and discover their 
surroundings, as though they were on foot and everything was just a short walk 
away. 
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Or to quote Rick Prelinger from his fantastic talk, at the Long Now 
Foundation, called "Lost Landscapes of San 
Francisco http://fora.tv/2008/12/19/Riok_preiinger_Lost_Landscapes_of_SE 



"Knitting (geo) tags and images together is one tiny incremental step 
towards the creation of what you might call a four-dimensional model of 
the world that shows the development of place over time." 



There's a in-depth blog post titled "Things I'm Standing Next 

To http://code.flickr.com/blog/2 09/02/09/things-im-standing-next- 

to/ " on the code.flickr weblog http://code.fiickr.com/biog/ which covers 
all the details but the really short version is that you can append /nearby to any 
geotagged photo URL and we'll show you photos . . . 

nearby http: //www. flickr.com/groups/flickrcommons/discuss/ 72 157 6 13 62 786; 




Photos from 
russelldavies http://www.flickr.com/photos/russelldavies/ , 
heather http://www.flickr.com/photos/heather , 

ldandersen http://www.flickr.com/photos/ldandersen/ (with apologies to 
Jacob Harris http: //open.blogs .nytimes.com/author/jacob-harris/ ) 
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buckets of tangents 



This is the story I want to remember 

This is the story I want to remember 

Untitled Intimacies 

No training for trouble-shooting 



This is the story I want to remember 



1 person counts this photo as a favorite 

i sal flesh added this as a favorite on 10 Apr 09. 



i\? 



see also: py-WSCluStr http://github.com/straup/py- 
wsclustr/tree/master , testing py- 

WSCluStr http://www.flickr.com/photos/straup/3428587847/ and Selflesh's 
photOStream http://www.flickr.com/photos/selflesh/ 
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This is the story I want to remember 
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Untitled Intimacies 



Vve been threatening to write an extra-long and extra-twisty blog post ever 
since PaperCamp http : / /bookcamp . pbwiki . com/PaperCamp that would tie a 
whole bunch of related ideas together (like a set of headphones knotted in your 
pocket). I can't even remember what half of them are, or how they fit together, now 
and the other half continue to be bogged down in messy coding and packaging 
details. So, I figure I might as well try to kick the problem in the untitled intimacy 
with the "your ~lbin directory" post and see what happens. Also, Myles complained 
about not having anything to read. 

A few years ago, the now defunct 
Oreillynet.COm http : / /web . archive . org/web/* /http : / /oreillynet . com 
website ran a series of articles called something like "your ~/bin directory". The 
idea was to get people to write about the quick and dirty scripts they wrote for 
themselves to solve the variety of problems and hurdles they encountered every day. 
They often weren't very elegant but they worked — they just fucking did 

it http://www.slideshare.net/straup/history-boxes-presentation/13 — at 

least for that person. It was a lovely series because it was an especially confortable 
window on which to sit and see how a person solved a problem and, just as 
importantly, which issues they simply chose to side-step. 




Everything is easier when you're just writing for yourself, right? 

I was mentioning all this to Seth http://mojodna.net/ one day, during a 
meeting when I'd showed him the Untitled 

Intimacies http://www.flickr.com/photos/straup/sets/72157 610685215313/ 
Twitter-posts-on-a-map pictures I'd been creating and uploading to Flickr. These are 
selected Twitter messages that have been geotagged and plotted on a stylized 

map http://mike.teczno.com/notes/arduino-atkinson.html with a big 

honking pinwin containing a cropped image of Twitter's own big honking pinwin, 
generated using a rasterized version of the site's HTML. I could have also used the 
API to draw the text of a post but I like the idea of including the actual markup 
specific to a time and in place both the user and site's history. 



Layers http: / /magicalnihilism. wordpress.com/2 00 9/02/ 18 /exporting- 
the-past-into-the-future-or-the-possibility-jelly-lives-on-the- 



hypersurf ace-of -the-present/ of context. 

You can draw a pretty straight line from the early 
Net: :Flickr: :GeO http : / /www . aaronland . inf o/weblog/ 2 007/06/08 /pynchonite/i* 
f lickr-geo maps to the 

pinwins http: //www. aaronland. inf o/weblog/2 008 /02/05/fox/#ws- 
modestmaps and then from history 

box http: //www. aaronland. info/weblog/2008/07/27/invisible/#historybox 
to history box http://www.slideshare.net/straup/history-boxes- 
presentation to history box http://blog.flickr.net/en/2 009/02/24/an- 
abundant-present/ to these and then back again to delicious 
maps http: //www. aaronland. inf o/weblog/2 007/08 /24/aware/#delmaps_02 
and, carrying on the grand tradition of one name stupider than the next, "bucket" 

maps http: //www.slideshare.net/straup/taking-a-line-for-a-walk- 

presentation/2 8 . There's a lot to talk about in all of that; I'm pretty sure that's at 
least one end to the thread with. ..no end, that I mentioned above. These days, it 
mostly just comes out in the form of drunken exhaustion, angry ranting and 
borderline character assasinations. 

But that's a story for another day. 

What I was talking about, that day, was trying to do a "your ~/bin directory" 
style post about the code that generates the Twitter maps because I figured someone 
else might want to "geotag" their Twitter posts and I thought it would be nice to 
share. In that way that you want to share where you don't want any of the burden of 
maintaining the code or making it work anywhere but your own setup. 

Enter GitHub. 

I come slowly to revision control systems in part because they go in and out 
of fashion at roughly the same rate as boy-boy and girly-girl bands. I believe that 
some, like Git, can have real advantages http://speirs.org/2008/07/09/on- 
switching-to-git/ over others or have been tailored to fit a specific class of 
projects but they are not the ones I work on. My needs are pretty pedestrian and 
while there may be perfectly good reasons that git commit does both exactly 



what it sounds like and not at all what you'd normally expect, given the semantics of 
all the version control systems that came before it, the disconnect has always made 
me a little wary of making the effort. 

There's also the part where it seems like every single introductory HO WTO 
on the subject of Git fails the "Hello World" test by requiring what feels like four 
times as many steps to do the most basic of things, like this: 



cvs checkout foo 

cd foo; touch bar 

cvs add bar 

cvs commit -m "ur mum uses svn" bar 

Git feels like the Java of version control which is fine since Java's good at a 
lot of things but generally seems to come at the cost of making simple things a 
nuisance. But hey, it also seems ideal for "your ~/bin directory" style projects! 
That's what Paul Hammond did when he released his MiniMuni 
webapp http://www.paulhammond.org/2008/12/minimuni/ . He just slapped it 
up on GitHub and said: 



As the about page says, if you live exactly 6 minutes from Sunset 
Tunnel East Portal, 8 minutes from Duboce and Church, and 10 minutes 
from Church Station you may find it useful too. 



Within a day, two or three people had found it useful enough to fork and to 
tweak to work within whatever (n) minute walk from the MUNI they lived near. I 
like this. I like this because it's fast and cheap and no one asked Paul to make his 
code any more complicated than it needed it to be for him or fall in to the rabbit hole 
of abstraction; first another MUNI stop, then another transportation system, then the 



Simon Wistow is not wrong in his takedown of the Gitastic habit of 
forking first and asking questions 

later http://deflatermouse.livejournal.com/148975.html but I do think 
there is a place for the sort of rapid cloning of simple projects described above and 



GitHub seems to make it easier than most. 

Which is what I chose to hear Seth say to me that day, even if he didn't. So, 
here it is: 

http://github.com/straup/untitled-intimacies/ 

It's mostly written in Perl and will require you to install stuff using the 
"scary" CPAN. It currently only works on a Mac because it uses 

webkit2png http://www.paulhammond.org/webkit2png/ and requires 

PyObjC http://pyobjc.sourceforge.net/ . It shells out to Python. Twice. If 
you have a custom background you will need to patch the crop_tweet . py script 
which has about as much grace as a land mine. Depending on how your Mac is set 
up it shells out to Python twice, to two different versions of Python; if you can get 
PyObjC to build out of MacPorts, more power to you. It effectively shells out three 
times if you count the call to the ModestMaps ws-pinwin 

Server http: //modestmaps.com/examples-python-ws/ advanced/ running on 

localhost (and which is not started automatically). 

Actually, if your Twitter stream is not public it will probably work on a (not 
a Mac) because the part where you could pass your (Twitter) credentials in a GET 
request suddenly stopped working the other day and webkit2png, as it is written, 
doesn't read from a cookies file so I modified the code to work with a screen grab. 



# usage (simple) 

$> perl . /map_post .pi -c your.cfg -1 '45.123,-37.939' -u http://twitter.com/you/status/1234 

# use a screenshot instead of webkit2png (usually, because Twitter is doing 

# weird auth-y stuff they won't explain to anyone...) 

$> perl . /map_post .pi -c your.cfg -1 '45.123,-37.939' -u /path/to/screenshot 

I'd like to think that it would be as easy to fix as fetching a Twitter page 
using curl -C http://ask.metafilter.com/18923/How-do-you-handle- 
authentication-via-cookie-with-cuRL and rendering it as an image using 
webkit2png on a local file. But given how completely hosed Twitter's persistent 
cookies have been in the last month a more likely scenario will involve using 
WWW "Mechanize http : / /search . cpan . org/dist /WWW- 



Mechanize/iib/www/Mechanize/Exampies.pod and pretending to actually log in 
to the site itself before writing an HTML file for the Twitter message in question to 
disk. 

Update. Or like this which hasn't yet been plugged in to the rest of the code : 

http://github.COm/ . ..Ifetch_tweet.pl http://github.com/straup/untitled- 
intimacies/blob/92elb4427e45 906cb5406dec 17 957 3 8c5c2e5afc/f etch_tweet.pl 
/ love Perl. 



But otherwise it just works. I have pictures, and everything, to prove it. 

There are a lot of moving parts. Most are things that I use, or prefer to use, 
for a variety of related tasks: Python for maps and images, Perl for Flickr, Java for 
barcodes http://mike.teczno.com/notes/walking-papers.html and SO on. 
The important part in that is: preferred. I briefly considered rewriting 
map_post .pi in Python so that it could load the ModestMaps code natively but 
then realized that I had already written an HTTP wrapper in 

Perl http://search.cpan.org/dist/Net-ModestMaps/ ,1 would have to rewrite 
all the Flickr code in Python and mostly I just wanted something that worked "now" 
and didn't really care about building a shiny tower of idealized "beautiful" code. I 
can live with that. 

All in all, I've probably forgotten some of grunt work in setting things 

up http: //www. aaronland. inf o/weblog/2008/07/27/invisible/#historybox 

but, then again, these days almost everything installs out of one package manager or 
another so it's mostly a 

question http: //www. aaronland. info/weblog/2008/04/30/warstories/#f iltr03 
of typing "install x" over and over while you read the paper for a little while. 

Welcome to my ~/bin directory. 
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No training for trouble-shooting 



tell me I forget 
show me I remember 
involve me I understand 



I was asked to do a short presentation to the Flick team on the stuff I saw at 
ETech 2009. 1 don't normally take notes at conferences, and I was only there for 
two days, so this was the best I could do on short notice. There is only one image. 

It is also difficult to go to a conference like ETech and not confuse, or 
project, what you're doing with what people are talking about so I didn't bother 
trying to do otherwise. I'm not going to get in to what those connections are, or 
might be, because that would be kissing and telling so you'll just have to find your 
own associations. 



I would love to hear what other people would say using only these slides as 
the source material for their own talk. This one is taken from the last slide from Eric 
Paulos' presentation which in turn was taken from a Chinese proverb. 



design fictions 



Julian 

Bleecker http://en.oreiiiy.com/et2009/pubiic/scheduie/detaii/7514 



Design Fiction (Design Engaged 

2008) http: //www. slideshare.net/bleecker j /design-fiction- 
design-engaged- julian-bleecker-presentation-63 8 17 9 



information shadows 



Mike 

Kuniavsky http://en.oreilly.com/et2 009/public/schedule/detail/55 

ETech 2009: The Dotted-Line 

World http: //www. orangecone.com/archives/2009/03/etech_2009_the. 



music that can 

assume the force of 

law 



Benjamin 

Bratton http://en.oreilly.com/et2 009/public/schedule/detail/7 634 



Undesigning the Emergency: Against Prophylactic Urban 

Membranes http : / /www . bratton . info/emergency . html 



mapsfromscratch.com 



I was a booth bunny for this workshop! 
• Mike Migurski, Shawn 

Allen http://en.oreilly.com/et2009/public/schedule/detail/5555 



Maps from Scratch http://www.mapsfromscratch.com 



connecting documents 



vs. 



sharing knowledge 



Joi 

Ito http://en.oreilly.com/et2009/public/schedule/detail/6670 



Expanding the Public Domain: Part 

Zero http: //creat ivecommons.org/weblog/entry/ 13304 



design for 
abandonment 



It is poor form to talk about climate change, and personal footprints, and 
then have the nerve to say things like: If everyone lived like me, we'd need 42 
planets. I have no idea if I said this or if Chris did... 



Chris 

Luebkeman http://en.oreiiiy.eom/et2009/pubiic/scheduie/detaii/7 



in ur tubes ... explodin 



Christa 

Hockensmith http://en.oreilly.com/et2009/public/schedule/detail/ 



ted Stevens was right 



Molly Wright 

Steenson http://en.oreilly.com/et2 009/public/schedule/detail/698( 



It's really just a series of 

tubes http://radar.oreilly.com/2009/04/its-really-just-a- 
series-of -tu . html 



avatars of a service 



I would love to be able to think of something other than iTunes when I look 
at an Airport Express but since the hardware and software is designed (read: locked) 
to prevent that, Mike's point bears that much more truthiness. 



Mike 

Kuniavsky http://en.oreilly.com/et2 00 9/public/schedule/detail/55 



apple store, nyc 



As an aside (to ETech entirely) these people have been researching using 
both tags and explicit geo data, associated with Flickr photos, to infer points of 
interest in large cities. It turns out that the Apple store is the fifth most interesting 
"place" in New York City. 



D. Crandall, L. Backstrom, D. Huttenlocher, J. Kleinberg. Mapping 
the World's Photos 

(WWW09) http://www.cs.cornell.edu/home/kleinber/www09- 
photos .pdf 



history boxes 



This was just me projecting. 

• History Boxes http://www.slideshare.net/straup/history- 
boxes -present at ion 



An Abundant Present http://biog.fiickr.net/2009/02/24/an- 

abundant-present/ 



symbolic reality 
static relationship 



David Merrill, Jeevan 

Kalanithic http://en.oreiiiy.com/et2009/pubiic/scheduie/detaii/54 



Siftables http: //web. media. mit.edu/-dmerr ill/ siftables.html 



say yes 

ask questions 

later 



Tarikh's talk is near the 36 minute mark. Meanwhile, it is comforting (I 
think) to know that someone can find a deeper meaning in Ashot passed out on the 
couch. 

• Tarikh 

Korula http://en.oreilly.com/et2 09/public/schedule/detail/69 80 



Uncommon Projects http : / /uncommonpro j ects . com/ 



bicycle built for 2000 



Aaron 

Koblin http://en.oreilly.com/et2009/public/schedule/detail/675E 



Bicycle built for 2000 http://bicyclebuiltfortwothousand.com 



think about the data 
not the real world 



Aaron 

Koblin http://en.oreilly.com/et2009/public/schedule/detail/67 58 




This is me projecting, again, and getting all weak in the knees at the idea of 
using Aaron's flight pattern map tiles for Flickr photos that have been geotagged at 
airports. 

• Aaron 

Koblin http://en.oreilly.com/et2009/public/schedule/detail/6758 



Flight 

Patterns http : / /www . aaronkoblin . com/work/ flightpatterns / index . htm 



no training for 
trouble-shooting 



This one seems really important to me. We don't do enough of it in general 
and certainly not when we're all hand- waving and making happy- talk about the 
future. 



Eric Rasmussen, Eduardo 

Jezierski http://en.oreilly.com/et2009/public/schedule/detail/66E 



blister packs of 
functionality 



Eric 

Paulos http://en.oreilly.com/et2 009/public/schedule/detail/5565 



a network is 

a collection of 

interfaces 



A collection of interfaces becomes a network; A collection of networks 
becomes a territory; A territory exposes interfaces. 

• Benjamin 

Bratton http://en.oreilly.com/et2 009/public/schedule/detail/7 634 



Undesigning the Emergency: Against Prophylactic Urban 

Membranes http : / /www . bratton . info/emergency . html 




Ben 

Cerveny http://en.oreilly.com/et2 009/public/schedule/detail/5558 



to protect the 

necessary condition 

of churn 



Benjamin 

Bratton http://en.oreilly.com/et2 009/public/schedule/detail/7 634 



Undesigning the Emergency: Against Prophylactic Urban 

Membranes http : / /www . bratton . info/emergency . html 



the parliament of 
things 



Bruno Latour; I'm not sure that I buy this in practice but it is interesting 
protective gear for touching the possibility jelly with. 



pachube 



It's like FireEagle for your sensor- world, complete with all the sticky 
questions about (near) real-time and historical data, privacy and two-way sync. 

• Usman Hague http://www.pachube.com/ 



profit! 



notes and links -from etech 2009 



There ] s also a PDF 

Version /weblog/2 009/03/ 14 /buckets /etech09_notes_links .pdf available 

here, for safe-keeping. 
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Things I Have Written Elsewhere 

#20090407 



The Only Question Left Is 



The Only Question Left Is 



This post was originally published on the code.flickr.com weblog 
in April, 2009. 



TREES 
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At the Emerging 

Technology http://en.oreilly.com/et2009/public/content/home 
conference this year Stamen Design's http://www.stamen.com/ Michal 
Migurski and Shawn Allen led an afternoon workshop called "Maps from 
Scratch: Online Maps from the Ground 

Up http://en.oreilly.com/et2009/public/schedule/detail/5555 
where people made digital maps from, well. . . scratch. 

If you've never heard of Stamen they've been doing some of the most 
exciting work around the idea of "custom 



Cartography http: //mike. teczno.com/notes/oakland-crime- 
maps/XI.html " including: Cabspotting http://cabspotting.org/ , 
Oakland Crimespotting http://oakland.crimespotting.org/ and Old 
Oakland Maps http://teczno.com/oid-oakiand/ , work for the 
London Olympics http://www.tom-carden.co.uk/2009/02/12/new- 
maps-at-iondon20i2com/ , and designing custom map tiles for 
CloudMade http://www.sensescape.com/2009/02/cioudmade/ 
(Stamen also built the recently launched Flickr 
Clock http://www.flickr.com/explore/clock :-) 

All of this is interesting in its own right; proof that there is still a lot 
of room in which to imagine maps beyond so-called red-dot 
fever http://mappinghacks.com/2006/04/07/web-map-api-roundup/ . 
All of this is extra interesting in light of Apple's recent announcement to 
allow developers to define their own map 

tdes http: //arstechnica. com/apple /news /2 00 9/ 03 /iphone-sdk-f ocus- 
maps-f rom-your-apps . ars in the next iPhone OS release. All of this super- 
duper interesting because it is work produced by a team of less than 10 
people. 

The tools http://www.osgeo.org/ , and increasingly the 
data http: //magicalnihilism.wordpress .com/ 20 09/0 4/ 03 /data-as- 
seductive-material/ , to build the maps we 

want http://blog.everyblock.com/2008/feb/18/maps/ are bubbling up 
and becoming easier and more accessible to more people every day. Easier, 
anyway. 



"One of the things that made this tutorial especially interesting 
for us was our use of Amazon's EC2 service, the "Elastic 
Compute Cloud" that provides billed-by-the-hour virtual servers 
with speedy internet connections and a wide variety of 
operating system and configuration options. Each participant 
received a login to a freshly-made EC2 instance (a single 
server) with code and lesson data already in-place. We walked 
through the five stages of the tutorial with the group coding 
along and making their own maps, starting from incomplete 
initial files and progressing through added layers of complexity. 

"Probably the biggest hassle with open source geospatial 
software is getting the full stack installed and set up, so we've 
gone ahead and made the AMI (Amazon Machine Image, a 
template for a virtual server) available publicly for anyone to 
use, along with notes on the process we used to create 
it http://www.mapsfromscratch.com ." 

— Michal Migurski http : / /mike . tec zno . com/notes /maps- 

f rom-scratch . html 



The Maps From Scratch (MFS) AMI may not be a Leveraged Turn 
Key Synergistic Do-What-I- 

Mean http://www.catb.org/-esr/jargon/htmi/D/DwiM.htmi Solutions 
Platform but, really, anything that dulls the hassle and cost of setting up 
specialized software is a great big step in the right direction. I mention all of 
this because Clustr, the command-line application we use to derive 
shapefiles from geotagged 

photos http://code.flickr.com/blog/2008/10/30/the-shape-of- 
aipha/ , has recently been added to the list of tools bundled with the MFS 
AMI. 

Specifically: ami-4d769124. 



We're super excited about this because it means that Clustr is that 
much easier for people to use. We expressly chose to make Clustr an open- 
source project to share some of the tools we've developed with the 
community but it has also always had a relatively high barrier to entry. 
Building and configuring a Unix machine is often more that most people are 
interested in, let alone compiling big and complicated maths libraries from 
scratch. Clustr on EC2 is not a magic pony factory but hopefully it will make 
the application a little friendlier. 




Creating and configuring an EC2 account is too involved for this post 
but there are lots of good resources out there, starting with Amazon's own 
documentation http://aws.amazon.com/ec2/ . When I'm stuck I usually 
refer back to Paul Stamatiou's How To: Getting Started with Amazon 

EC2 http: //paulstamatiou. com/2008/04/05/how-to-getting-started- 
with-amazon-ec2http: //paulstamatiou. com/2 008/0 4/ 05 /how-to- 
getting-started-with-amazon-ec2 



Assuming that you familiar using Unix command line tools, let's also 
assume that you have gotten all your ducks in a row and are ready to fire up 
the MFS AMI: 

your-computer> ec2-run-instances ami-4d769124 -k example-keypair 
your-computer> ec2-describe-instances 

At which point, you'll see something like this: 

INSTANCE i-xxxxxxxx ami-4d769124 ec2-xxxxx.amazonaws.com blah blah blah 

i-xxxxxxxx is the unique identifier of your current EC2 session. 
You will need this to tell Amazon to shut down the server and stop billing 
you for its use. 

ec2-xxxxx.amazonaws .com is the address of your EC2 server 
on the Internets. 

Once you have that information, you can start using Clustr. First, log 
in and create a new folder where you'll save your shapefile: 

your-computer> ssh -i example-rsa-key root@ec2-xxxxx.amazonaws.com 
ec2-xxxxx.amazonaws.com> mkdir /root/clustr-test 

The MFS AMI comes complete with a series of sample "points" files 
to render. We'll start with the list of all the geotagged photos uploaded to 

Flickr http://www.flickr.com/photos/revdancatt/3398050524/ on 
March 24: 

ec2-xxxxx.amazonaws.com> /usr/bin/clustr -v -a 0.001 \ 
/root/clustr/start/points-2009-03-24.txt \ 

/root/clustr-test/clustr-test.shp 

By default Clustr generates a series of files named clustr (dot 
shp, dot dbf and dot shx because 

shapefiles http://en.wikipedia.org/wiki/Shapefile are funny that 
way) in the current working directory. You can specify an alternate name by 



passing a fully qualified path as the last argument to Clustr. When run in 
verbose mode (that's the -v flag) you'll see something like this: 



Reading points from input. 

Got 44410 points for tag '20090324'. 

799 component(s) found for alpha value 0.001. 

- 23 vertices, area: 86.7491, perimeter: 71.9647 

- 32 vertices, area: 1171.51, perimeter: 41.3095 

- 8 vertices, area: 18.5112, perimeter: 0.529504 

- 12 vertices, area: 1484.81, perimeter: 10.8544 

Writing 505 polygons to shapefile. 

Yay! 



ec2-xxxxx.amazonaws.com> Is -la /root/clustr-test 

total 172 

drwxr-xr-x 2 root root 4096 2009-04-07 03:14 . 

drwxr-xr-x 5 root root 4096 2009-04-07 02:22 .. 

-rw-r — r — 1 root root 52208 2009-04-07 03:14 clustr-test .dbf 

-rw-r — r — 1 root root 97388 2009-04-07 03:14 clustr-test . shp 

-rw-r — r — 1 root root 4140 2009-04-07 03:14 clustr-test . shx 

Now copy the shapefiles back to your computer and terminate your 
EC2 instance (or you might be surprised when you get your next billing 
statement from Amazon) . 



ec2-xxxxx.amazonaws.com> scp -r /root/clustr-test \ 
you@your-computer : /path/to/your/desktop/ 

ec2-xxxxx. amazonaws .com> exit 

your-computer> ec2-terminate-instances i-xxxxxxxxx 

I created this image (using the open source 
QGIS http://www.qgis . org/ application) for all those points by running 
Clustr multiple times with alpha numbers ranging from 0.05 to 603: 




Here's another version rendered using the 
nik2img http : //code . google . com/p/mapnik-utils/wiki/Nik2Img 
application and a custom style sheet, both included with the MFS 
distribution: 




Here's one of all the geotagged photos tagged 

"route66 http://www.flickr.com/photos/tags/route66 " (with alpha 
numbers ranging from 0.001 to 0.5): 




Apologies and big sloppy kisses to Stamen's own Mappr (first released in 2005). 



Or tagged 
"caltrain http://www.flickr.com/photos/tags/caltrain ",the 
commuter train that runs between San Francisco and San Jose: 




Meanwhile, Matt Biddulph at Dopplr http: //www.doppir . com/ 
has been generating a series of 

visualizations http://www.flickr.com/photos/mbiddulph/tags/clustr/ 
depicting the shape of where to eat, stay and explore for the cities in their 
Places http : / /blog . dopplr .com/2009/03/20 /the-dopplr-new-y ork- 
release-rolling-out-the-social-atlas/ database. This is what 
London http://www.dopplr.com/place/gb/london looks like: 
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Or: "London dopplr places, filtered to only places my social 
network has been to, 

clustrd http://www.flickr.com/photos/mbiddulph/3421922514/ ". 

One of the things I like the most about Clustr is that it will generate 
shape(file)s for any old list of geographic coordinates. Now that most of the 
hassle of setting up Clustr has been (mostly) removed, the only question left 
is: What do you want to 

render? http: //magicalnihilism.wordpress.com/2009/04/06/my-first- 
cloudmade-map-style-lynchian_mid/ 



"They do not detail locations in space but histories of 
movement that constitute space." 

— Rob Kitchin, Chris 

Perkins http : / /cyberbadger . blogspot . com/2 00 8 / 1 1 /map- 
studies-manifesto-complete . html 



If you're like me you're probably thinking something like "Wouldn't 
it be nice if I could just POST a points file to a webservice running on the 
AMI and have it return a compressed shapefile?" It sure would so I wrote a 
quick and dirty version http://github.com/straup/ws- 
ciustr/tree/master (not included in the MFS AMI; you'll need to do that 
yourself) in PHP but if there are any Apache hackers in the house who want 
to make a zippy C version that would be even Moar Awesome . 

If you don't want to use the MFS AMI and would rather just install 
Clustr on your own machine instance, here are the steps I went through to get 
it work on a Debian 5.0 (Lenny) AMI; presumably the steps are basically the 
same for any Linux flavoured operating system: 



$> apt-get update 
$> apt-get install libcgal-dev 
$> apt-get install libgdall-dev 
$> apt-get install subversion 

$> SVn CO http://code.flickr.com/svn/trunk/clustr/ http://code.flickr.com/svn/trunk/clustr/ 

$> cd clustr 

$> make 

$> cp clustr /usr/bin/ 

$> clustr -h 

clustr 0.2 - construct polygons from tagged points 
written by Schuyler Erie 

(c) 2007-2008 Yahoo!, Inc. 

Usage: clustr [-a <n> ] [-p] [-v] <input> <output> 

-h, -? this help message 

-v be verbose (default: off) 

-a <n> set alpha value (default: use "optimal" value) 

-p output points to shapefile, instead of polygons 

If <input> is missing or given as "-", stdin is used. 
If <output> is missing, output is written to clustr. shp. 
Input file should be formatted as: <tag> <lon> <lat>\n 
Tags must not contain spaces. 



Just like that! 




2009-04-1 1T22: 13:35-0700 







The Hammock of Interpretation 



In a spinny bar 

The Interpretation of Bias 

py-wsclustr.php 



In a spinny bar 



spinny bar is spinning 



tioa — rafl notM and Itaki from Muasumt and tht Web 2003 



No one asked me to do a recap presentation for Museums and the Web but 
since I enjoyed the one I did for 

ETech http://aaronland.info/talks/#etech09 SO much I just decided to 
abuse the format and do a "dog-eared" conference post from my notes and 
conversations. 

• http://wWW.archimuse.COm/mw2009/ http://www.archimuse.com/mw2 009/ 



http://www.flickr.com/photos/straup/sets/72157616956991619/ http : / /w 



excessive orientation 



(com promised by) 



From Max Anderson's opening keynote, which was excellent and inspiring 
and seemed to act as the touchstone for the rest of the conference. 



http://www.artbabble.org/video/moving-virtual-visceral-maxwell-l- 
andersons-plenary-address-museums-and-web- 

2009 http: //www. ar tbabble.org/ video /moving- virtual- 
visceral-maxwell- 1-ander sons -plenary-address-museums-and- 
web-2009 



foster 
projection 



From Max Anderson's opening keynote. 

• http://www.artbabble.org/video/moving-virtual-visceral-maxwell-l- 
andersons-plenary-address-museums-and-web- 

2009 http: //www. ar tbabble.org/ video /moving- virtual- 
visceral-maxwell- 1-andersons-plenary-address-museums-and- 
web-2009 



http:llopenobjects.blogspot.com/2009l04lmax-anderson-indianapolis- 
museum-of- 

art.html http : / /openob j ect s . blogspot .com/2009/04 /max- 
anderson-indianapolis-museum-of -art . html 



never 

willingly 

outsource 

creativity 



From Max Anderson's opening keynote. This rang especially true with me 
since it was part of the point I was trying to get across when I spoke at Museums 
and the Web last year and the IMA http : / /www . imamuseum . org are proof of 
what a small team that gives a shit can accomplish with a little bit of support. 

• http://www.artbabble.org/video/moving-virtual-visceral-maxwell-l- 
andersons-plenary-address-museums-and-web- 

2009 http: //www. ar tbabble.org/ video /moving- virtual- 
visceral-maxwell- 1-andersons-plenary-address-museums-and- 
web-2009 

• http://www.archimuse.com/mw2008/papers/straup_cope/straup_cope.html 



http://www.aaronland.info/talks#mw08 http : / /www . aaroniand . inf o/taii 



#passiveinvitations 



This was from the social media session and was a comment by Joe Hoover 
about placeography and creating projects as passive invitations. I also like it as a 
way to describe the way Twitter was used as a very public and good-natured 
hecklebot throughout the conference. 

• http://www.archimuse.com/mw2009/papers/baker/baker.html http: //www 

• http://www.slideshare.net/mnHistoricalSociety/collaborative-history- 
creating-and-fostering-a-wiki-community? 

type— presentation http://www.slideshare.net/mnHistoricalSociety/cc 
hi s tor y-cr eating- and- f os tering-a-wiki -community? 
type=presentation 

• http://www.placeography.org/index.php/Main_Page http: //www. piaceog: 

• http.i/search .twitter.com/search ? 

q-mw2009 http: //search. twitter. com/search?q=mw2009 

• http:lljoi.itO.com/joiwikilHeckleBot http : / / joi . ito . com/ joiwiki/Heckle 



object' db jwiki 



The social media session was heavily focused on the use of wikis in a 
museum context and two things stuck out: 1) How many people are starting to use 
Media Wiki and how many of them can already imagine the possibility of using it 
"in-house"; that is, effectively replacing their existing and highly-specialized CMS' 
and having curators and the "community" work on the same document. 2) The 
progress that the Semantic Media Wiki people have made. I say that not because I 
want to spray triples all over everything but because they are slowly building out the 
tools to hide the vagueries of entering structured data from people. 

• http://www.archimuse.com/mw2009/sescal/sescal_20090416.html http: // 

• http://semantic-mediawiki.org/ http: //semantic-mediawiki .org/ 



http://www.mediawiki.Org/wiki/Extension:Semantic_Forms http : / /www . n 



oral history 

document history 

cloud history 



Darren Peacock asked about this in the "psychogeography and storytelling" 
unconference session and it's just a nice thought exercise. I don't expect there are 
any answers (yet) but it was an idea that kept cropping up in conversations. 

• http://conference.archimuse.com/forum/unconference_sessions_whats_wh 



http://www.archimuse.com/mw2009/bios/au_235013222.html http : / /www 



the weird dancing lady 
and the question of art 



One of the things I started getting on about the second night, at the bar 
covered in moose heads in the building said to be designed by Kurt Vonnegut's 
grandfather, was printed FAQs under individual works of arts. Why not actually 
answer the question: Why is this considered art? Why not try to answer the question 
outside of the formal language of art discourse? Why not see what questions people 
are really asking and then, at least for some of the questions, let visitors answer 
some of them themselves? 

This seemed all the more interesting to me because there is an LED/video 
installation near the bar showing a stylized woman swaying her hips back and forth, 
in an endless loop. It's a bit creepy really and from a distance looks like a weird and 
slightly disjoint attempt at public art. It prompted a lot of jokes until someone 
pointed out that it really was capital- A art and then everyone shut up. Because, you 
know, it was. ..art. 



http://www.flickr.com/photos/_mia/3444407039/in/pool- 

mw2009l http://www.flickr.com/photos/_mia/3444407039/in/pool- 

mw2009/ 



deeds of gifts 



Josh Greenberg, from the NYPL, and I talked about this in the context of the 
Commons, the desire (and pitfalls) of people wanting to put their work in the public 
domain and generally the uncharted territory called "backing up the web". 



http://www.epistemographer.com/ http : / /www . epistemogr apher . com/ 



attaching a scene 



This was a phrase Richard Urban mentioned in the context of some semantic 
markup language for collections whose name I've forgotten. I'm less concerned 
with the mechanics than with the idea of attaching a whole scene, something more 
than a series of staccato tags or keywords, to a piece. Like a short story or a winter 
coat, which would have made artists like Francis Bacon cringe but it tickled my 
"magic words" bell which I always enjoy. 

It is also an interesting avenue when applied to maps and Josh (Greenberg) 
had just finished showing me the work that Schuyler (Erie) has been doing building 
tools for the NYPL historical maps collection. 

• http://www.inherentvice.net/ http://www.inherentvice.net/ 

• http:lldev.maps.nypl.org/warperl http : / /dev .maps . nypl . org/warper / 



http:llmappinghacks.com/2009l04l20ltalks-on-the-research-web-and- 
on-sms-in-the-developing- 

WOrldl http://mappinghacks.com/2 009/04/20/talks-on-the- 
research-web-and-on-sms-in-the-developing-world/ 



http://delicious.com/straup/magicwords http://deiicious.eom/straup/m 



they can't make you win 
but 

they can 

interfere 

to keep you 

playing 



Nina Simon talking about casino theory but applied in a non-creepy way to 
museums. 

• http://www.archimuse.com/mw2009/papers/simon/simon.html http: //wwv 

• http://prezi.COm/30512/ http://prezi.com/30512/ 



http://wWW.museumtWO.blogSpot.COm http:/ /www. museumtwo.blogspot.c 







return 
books 




return 

awesome 

books 





Nina Simon had two "library" slides in her talk. One was a fancy-pants 
RFID-enabled library in the Netherlands where the both the books and the drop-off 
shelves were programmed to automatically tag a book ("good", "bad", "sad", 
"mad", etc..) when placed on a particular shelf and the other was a mocked-up 
photo of an old skool book drop on the side of a library where the labels had been 
changed to read "return books" and "return awesome books". I liked Nina's better. 

• http://www.archimuse.com/mw2009/papers/simon/simon.html http: //wwv 

• http://prezi.COm/30512/ http://prezi.com/30512/ 



http://wwW.museumtWO.blogSpot.COm http : / /www . museumtwo . blogspot . c 



last.fm for museums 



Richard (Urban) suggested this during Nina Simon's talk and all I can say is: 
Um... fuck yeah! 

• http://www.inherentvice.net/ http://www.inherentvice.net/ 



http://lastfm.COm http://lastfm.com 



follow bacon 



This was the other seed that I left in anyone's ear who cared to listen. I want 
to able to "friend", "follow", whatever individual works of art in a collection. I want 
to know when a painting goes on display or back to the storage facility. Never mind 
big traveling exhibitions, institutions lend out individual works all the time and I 
want to know when something that I'm interested in is going to be on display 
nearby. It's the same principle as Dopplr really: If a work I travels to Los Angeles 
then I might make the effort to visit it too. 



museums 

are the new 

amusement parks 



One of Neb's comments in his talk at 
ETech http://www.aaronland.info/weblog/2009/03/14/buckets/#etech09 
this year, was that an amusement park afforded you the freedom to side-step some 
of the thornier issues surrounding ubiquitous (and physical) computing and the 
sensor world because you were working with "willing participants". I'm just 
saying... 

• http://www.archimuse.eom/mw2009/sessions/index.html#EVT135000949 



http://www.overmorgen.com/weblog/2009/03/13/ben_cerveny.php http : / / 



an historical 
sensation 



Frequency 1550, Waag Society. 

• http://www.archimuse.com/mw2009/papers/vandijk/vandijk.html http: //' 

• http://www.slideshare.net/museumsandtheweb/out-there-connecting- 
people -place s-and-stories ? 

type—pOWerpoint http://www.slideshare.net/museumsandtheweb/out- 
ther e -connecting-people-places- and- s tor ies?type=power point 



http://freql550.waag.org/ http : / / f req 1550. waag . org/ 



confession 

in front of ■ 

mirror 



Rituals, Waag Society. 

• http://www.archimuse.com/mw2009/papers/vandijk/vandijk.html http: //' 

• http://www.slideshare.net/museumsandtheweb/out-there-connecting- 
people -place s-and-stories ? 

type—powerpoint http://www.slideshare.net/museumsandtheweb/out- 
ther e -connecting-people-places- and- s tor ies?type=power point 

• http:/ /WWW .Waag.org/projectlrituelen http: //www. waag.org/project/riti 



http:/ /WWW .vimeO.com/2804742 http://www.vimeo.com/2804742 



parks Canada 

has a new media 

department 



Who knew? I mean... seriously, who knew? Now they just need a, whadaya 
call it, a "website". 



http://conference.archimuse.com/forum/mw2009_presentation_slides_gpst 



torque vs. power 



It turns out the spinny bar, at the top of the conference hotel, runs off of 
nothing more than a single 5/8 horsepower engine and a pair of gear reducers that 
push the whole of thing around on a rail. There's a lesson in that. 

The other lesson is that so-called private tours, of anything really, are the 
most interesting and helps me belabour the point that curators and Art Professionals 
are about a million times more interesting when they let their guard down (read: are 
drunk) and speak simply, rather than in the language of "professional discourse", 
about the things that they are passionate about. 

• http://en.wikipedia.org/wiki/Torque http://en.wikipedia.org/wiki/Tort 



http://www.flickr.com/search/groups/? 
w=817967%40N24&q=spinny+bar&m=pool http : //www. f lickr . com/sea 

w=817967%40N24&q=spinny+bar&m=pool 



start small 
but start 



It's a good thing Paula's talk was so interesting because otherwise I'd give 
her shit for dropping the "switch" and "wookie" slide from the materials she posted 
online. 

• http://www.archimuse.com/mw2009/papers/bray/bray.html http: //www. a: 

• http://www.slideshare.net/paulabrary/flickr-commons-open-licesing- 
and-the-future-for-collection? 

type-presentation http: //www. slideshare . net/paulabrary/f lickr- 
commons -open- 1 ices ing- and- the- future- for-col lection? 
type=presentation 



http://www.flickr.com/photos/_mia/3452647726/ http://www.fiickr.com, 



critical friends 



This was from Brian Kelly's "stop and make sure we haven't painted 
ourselves in to a Web 2.0 corner" talk. I don't actually remember how he segued 
from that in to the idea of critical friends but it's a lovely phrase. 

• http://www.archimuse.com/mw2009/papers/kelly/kelly.html http: //www. a 

• http://www.slideshare.net/museumsandtheweb/time-to-stop-doing- 
and-start-thinking-a-framework-for-exploiting-web-20-services? 

type— power point http: //www. slideshare.net/museumsandtheweb/time- 
to- s top-doing- and- s t ar t- t hinking-a- f ramework-f or-exploit ing- 
web-20 -services ?type=power point 

• http://ascii.textfiles.com/archives/1961 http : / /ascii . textf iies . com/ arc 



a graph that goes 

like this L is just a 

graph of the 

internet 



Attributed to Seb Chan. 



• http://www.powerhousemuseum.com/dmsblog/index.php/2009/04/27/mw20 
clouds-switches-apis-geolocation-and-galleries-a-shoddy- 

SUmmaryl http : / /www . powerhousemuseum . com/dmsblog/ index . php/2 09/ 
c louds- switches -apis- geoloc at ion-and-galleries -a- shoddy- 
summary/ 



do one thing 



Just fucking do it. 



• http:llmuseum-api.pbworks.com/The-MW2009- 

challenge http://museum-api.pbworks.com/The-Mw2009-chaiienge 



what would 
brooklyn do? 



No one from the Brooklyn Museum could make it to the conference this year 
which is doubly sad since they seemed to be present in most people's conversations 
and cleaned up at the awards ceremony so I'll just point to this interview that Mike 
Ellis did with them: 

• http://conference.archimuse.com/forum/mw2009_best_web_sites_selected 

• http://wWW.vimeO.COm/4180587 http://www.vimeo.com/4180587 



http:llelectronicmuseum.org.uk/2009l04ll6lthe-brooklyn-museum- 
api-qa-with-shelley-bernstein-and-paul- 

beaudoinl http://electronicmuseum.org.uk/2009/04/16/the- 

brooklyn-museum-api-qa-with-shelley-bernstein-and-paul- 

beaudoin/ 



PLAY ART LOUD 



I wish I had seen this talk. 

• http://www.archimuse.com/mw2009/papers/moad/moad.html http: //www, 

• http://www.slideshare.net/museumsandtheweb/rob-stein-charles- 
moad-ed-bachta-museums-and-cloud-computing? 

type—powerpoint http://www.slideshare.net/museumsandtheweb/rob- 
stein-charles-moad-ed-bachta-museums-and-c loud-computing? 
type=powerpoint 

• http://www.artbabble.org/ http://www.artbabble.org/ 



2009-04-29T23 : 3 1 :40-0700 



The Interpretation of Bias 




These are the slides from my presentation at Museums and the Web 2009 
about the work Flickr did generating shapefiles out of geotagged photos. There are 
no notes to speak of since this was still a time when I delivered talks ad lib. In 
retrospect that was ... a thing. There is also an entire 

paper http : / /www . museumsandtheweb . com/mw2 009 /paper s /cope /cope . html Oil 

the subject which I've included verbatim below with the slides. 



"A map is, in its primary conception, a conventionalized picture of the 
Earth's pattern as seen from above." — Erwin Raisz 



"Every map is someone's way of getting you to look at the world his or 
her way" — Lucy Fellows 




This is a story about naming things. 



Trend kittens vs. sunsets for 2008 




According to Fiickr. 



N 



Who's on first? 



Geocoding is the act of converting a named place or address into machine 
readable coordinates, typically a latitude and longitude using the Mercator 
projection. This is Raisz' "conventionalized pattern". 

Reverse-geocoding is the act of taking a latitude and longitude and 
converting it back in to a named place. This is Lucy Fellows' getting you to "look at 
the world her way". 

Geotagging is the act of assigning geographic metadata to a photograph. As 
location information is increasingly stored in databases as a first-class data type, the 
way dates are, the phrase "geotagging" might now be considered a misleading. The 
term evolved at a time when few systems allowed users to index geographic data 
explicitly and so, taking matters into their own hands, they simply used existing 
tagging infrastructures to store, query and retrieve location information using ad hoc 
techniques. And the name stuck. 



When users geotag photos on Fiickr they, typically, first geocode an address 
to locate the place they want to say a photo was taken ,and Fiickr then revers- 
geocodes that point in order to display the name of the place the photo was taken in. 



Geocoding is the art of inferring meaning from a multiplicity of written forms, but it 
is costly to perform and not a particularly efficient way to store and retrieve 
documents, photos in Flickr's case, that have been geo-referenced, particularly 
when the initial geocoding may have been incorrect or used only to begin a more 
fine-grained positioning. 




Which means: This is a story about naming things. 
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It is easy to get caught up in the rhetoric of the digital revolution and believe 
in what I think of as the illusion of addressability. Everything is just ones and zeroes 
and can therefore can be given a unique identifier, goes the argument. The IPv6 
standard, for example, supports 2x128 possible IP addresses, leading many to claim 
every object created and every human born will be issued their own identifier and, 
in turn, the world will finally be connected in a seamless Web devoid of ambiguity. 
Unfortunately, the only "people" who relate to the world this way are robots. 




Once upon a time, I worked at a small Internet service provider where we all 
shared the responsibility of fielding technical support calls. One day I answered the 
phone and was asked to explain to a person only just beginning to use the Web why 
the "whitehouse.com" Web site was full of porn. This was a reasonable question, 
since most people were unaware that domain names are simply conversational 
short-cuts for the numeric addresses that actually identify sites on the Internet. I also 
explained that same "name" may exist in multiple domain spaces, none of which 
needed to know anything about the other, and that the actual United States 
government Web site was located at whitehouse .gov. 
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As it happens, the whitehouse .com Web site is no longer a porn site; instead 
it is a news and information portal, but that only serves to illustrate the still- fugitive 
nature of "named places", even on the Web. That transience is further reflected in 
the commercial nature of the Internet which requires that ownership of a domain 
name be renewed at a yearly cost, and failure to do so is all it takes for another 
interested party to claim "your" name. 
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"Histories of movement..." 
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The problem is not that both whitehouse .com and whitehouse .gov have 
different IP addresses. The problem is that all names are shortcuts for interlocking 
and constantly evolving sets of ideas, assumptions and relationships that computer 
science struggles, and generally fails, to keep pace with. The same issues manifest 
themselves in daily life as the association of French Champagne producers, 
straddling both sides of the debate over globalization and the politics of identity and 
terroir, take out full page advertisements in U.S. magazines (New Yorker 200901 19 
pg. 7) decrying the use of the name "Champagne" by wines produced in California. 
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1 "...that constitute place." 1 





Long before the Internet exploded into everyone's lives we were collecting 
photos in shoeboxes and writing dates and place names on the back of each image; 
fussing over the time it took to do but eventually regretting the decision not to. 
Place is history, and names are a reflection of the experiences we share with close 
relations, the larger community and even our own past. For all its ambiguity and 
shifting meaning, the name we give a place is the air that a representation of that 
place breathes. 



Geocoding a name, when someone searches for, or geotags, a photo, is only 
one-half of the problem. It allows you to fix a photo to a map, but how then do you 
connect that spot to memory and the history of the event? 
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"We need to understand the difference between location and place. 
Computers and mobiles are very good at location, but we describe where 
we are as place, where culture meets location." — Matt Jones 



























1 How does it work? 1 



























Flickr works with a large database of places, historical and contemporary, 
called GeoPlanet. Every place is assigned a unique identifier called a Where On 
Earth (WOE) ID and contains information about its ancestors, children, sibling and 
neighbouring places. For scaling and performance reasons we opted to store only 
pointers to individual WOE IDs in the Flickr database itself. 



county / maybe 



locality / and airports (and "hollywood") 



neighbourhood 



This meant that we had to boil the ocean a little in order to trim the number 
of possible members in any given location hierarchy. Eventually we settled on the 
following: Neighbourhoods; Localities; Counties (optional); Regions; Countries; 
Continents 




It is worth noting that this is only one possible hierarchy, and some of the 
choices we've made were due solely to the mechanics of operating a site as large as 
Flickr. If only as an exercise, a critic might argue that our model is biased towards a 
philosophy of liberal economic governance and traditional capitalist land 
ownership. 




Although it would be wrong to ascribe that much motive to our actions (we 
simply started the geo project with an existing data source that had been originally 
developed for use by government agencies and worked with what we had), it is 
interesting to consider the possible facets, still present, in an otherwise seemingly 
rigid hierarchy. 




A simple example is to contrast the way that Flickr and FireEagle (a Web 
application for collecting and sharing personal locative information) handle 
"localities" since the two sites share an almost exact hierarchy of places. Flickr 
treats anything with neighbourhoods as a locality, so in our model Duncans Mills, 
CA (pop. 84) and Mexico City (pop. 19M) are assumed to be the same "type" of 
place. FireEagle does not. If you authorize a third-party application to access 
information about your whereabouts at a city level, there is an expectation, 
assuming that you share an expectation that cities are "big", that your actual location 
will be suitably obfuscated (or "fuzzed"), and in a town of 84 people there's not a lot 
of room to get fuzzy in. 




With all that in mind, when a Flickr user drops a photo on the map: 




We calculate a search radius based on the map's zoom level, which is 
mapped to its corresponding place type in the Flickr hierarchy. 




We query GeoPlanet for all the places of that type (say, neighbourhoods) 
that intersect the query radius. 




We filter the list to only those places whose bounding box overlaps the 
center point, factoring in a degree of allowable fuzziness. 




If the center point falls outside a bounding box by a hundred meters, then it's 
usually still worth considering. 




We then iterate over the second list measuring both the distance of the center 
point to the logical and political center point of each bounding box. 




As a rule, the bounding box with the shortest distance wins, but the process 
is often messier and involves testing whether one bounding box is contained by 
another or whether a particular administrative relationship (the town where mail is 
typically delivered in the case of very small places) against the geographic reality. 




If none of those tests succeed in finding a suitable location, we determine the 
parent of the place type we've just tested and try again until we've exhausted all the 
possible place types in our hierarchy. 




Once we have established a "root" location we then query for its ancestors 
and store each along with its WOE ID in the Flickr database. 




For example, if you've tried to geotag a photo in the woods of Siberia and 
we can't find a matching town, we should at least be able to tell you the photo was 
taken in Siberia or, failing that, Russia. 




Sometimes, though, after all of that work we still choose to display the 
wrong place. 




"Location is an 80/20 problem where the 20 really matters." 
Straup Cope 



Aaron 
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Until recently we've only ever been able to work with bounding boxes, a 
limiting function of the available data from our provider. Despite that, we have been 
surprisingly successful at mapping location to place even in the case of 
neighbourhoods. But there have always been mistakes, and no one is very tolerant 
of mistakes about "place". Never mind so-called disputed places (Kashmir, the West 
Bank, Cyprus, etc.): all neighbourhoods are "disputed" around the edges. This is 
often true of localities, as well. Our experience, reverse geocoding photos at Flickr, 
has been that there are few better ways to pick a fight than to tell someone what 
neighbourhood they are in and being wrong. 
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1 History boxes 1 





The problem of course is that even if we mapped every combination of 
latitudes and longitudes, multiplied by an infinite number of decimal points, to a 
single place, people still wouldn't agree on the answer. In the same way that a point 
is really just a very small bounding box, a single point is also a flattening of the 
history of that place and, ultimately, there is only so much human subtlety you can, 
literally, codify in to a computer program. 
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-34.6055, -58.4719 - Google Maps Open Street Maps on Flickr 




immunities of authori 



Instead of simply trying to keep pace with all of human history and prejudice 
as a series of cascading if/else statements, what if we laid our cards (the named 
places that we think a pair of latitude and longitude coordinates might be) on the 
table, and when we are wrong, give people the chance to tell us what they meant 
and to learn from that? What if the next time you geotagged a photo, we compared 
where we think that place is against the places that you've told us are nearby? If not 
you, then your contacts? What if every single person on Flickr points out that a 
neighbourhood, or town, is just plain wrong? 




By adding a relatively small change to the site, allowing people to indicate 
that the place we had associated with a location was incorrect and allowing them to 
choose from a list of available options, we were able to better reflect their 
understanding of the world and begin to map facts on the ground rather than from 
on high. In the first week alone, we received one hundred thousand corrections! 
Depending on your point of view, this is either a testament to community-driven 
data, and so-called neo-geography, or proof that everything we've done to date was 
broken and wrong. 
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"We should be mapping information that in some ways has been 
historically unmappable because it is 1) not valued or is 2) actively seen 
as threatening or is 3) simply too hard to map using traditional tools." — 
Anselm Hook 




I tell these stories because as of this writing Flickr has over 100 million 
geotagged photos, each of which has up to six unique place (WOE) IDs associated 
with it. Over time we've wondered: if we plotted all the geotagged photos associated 
with a particular WOE ID, would we have enough data to generate a mainly 
accurate contour of that place? Not a perfect representation, perhaps, but something 
more fine-grained than a bounding box. It turns out we can, effectively rendering 
the contour of all the points associated with a place into a recognizable shape, using 
software we developed called Clustr. 




Clustr is a thin wrapper around the open source Computational Geometry 
Algorithms Library (CGAL) and uses a technique called "alpha shapes" to calculate 
the shape formed by a set of points: 



"Imagine a huge mass of ice-cream making up the space ... and 
containing the points as "hard" chocolate pieces. Using one of those 
sphere-formed ice-cream spoons we carve out all parts of the ice-cream 
block we can reach without bumping into chocolate pieces, thereby even 
carving out holes in the inside (eg. parts not reachable by simply moving 
the spoon from the outside). We will eventually end up with a (not 
necessarily convex) object bounded by caps, arcs and points. If we now 
straighten all "round" faces to triangles and line segments, we have an 
intuitive description of what is called the alpha shape..." — Tran Kai 
Frank Da, Mariette Yvinec 




The results have been stunning, and while we can draw a near perfect outline 
of the United States or France or Texas using no other geographic information than 
the locations associated with photos, many, if not most, of the shapes we create look 
a little weird. Possibly even "wrong". This is both okay and to be expected for a few 
reasons: 

• Sometimes we just don't have enough geotagged photos in a spot to 
make it is possible to create a shape. Even if we do have enough points 
to create a shape there aren't enough to create a shape that you'd 
recognize as the place where you live. We chose to publish those 
shapes anyway because it shows both what we know and don 't know 
about a place, and it encourages users to help us fix mistakes. 



We did a bad job reverse-geocoding photos for a particular spot and 
they've ended up associated with the wrong place. We've learned quite 
a lot about how to do a better job of it in the two and a half years we've 
been doing this, but human awareness is fickle and does not always 
lend itself to being formalized. 



Sometimes, the data we have for trying to work out what's going on is 
just bad or out of date, and we rely on users pointing out what is 
obvious to them but novel and unexpected to us. 

We are not very sophisticated yet in how we assign the size of the alpha 
variable when we generate shapes. As far as we can tell, no one else 
has done this sort of thing so as with reverse-geocoding, we are 
learning as we go. For example, with the exception of continents and 
countries, we boil all other places down to a single contiguous shape. 
We do this by slowly cranking up the size of the ice cream scoop; this 
in turn can lead to a loss of fidelity. There is a lot left to learn. 




Does the "shape" of Florida, or of Italy, include the waters that lie between 
the mainland and the surrounding islands? It's not usually the way we imagine the 
territory that a place occupies, but the warping of the coast of Massachusetts by 
people taking, and geotagging, photos while on whale- watching boats is not an 
entirely inaccurate depiction of place either. On the other hand, including the ocean 
between California and Hawaii as "part of" the United States would be kind of 
dumb. 




"They do not detail locations in space but histories of movement that 
constitute place." — Rob Kitchin, Chris Perkins 




More recently, while generating visualizations of these place shapes, we've 
noticed some interesting patterns. If we draw the shape of the city of Paris and then, 
on top of that, draw the shapes of all the city's child neighbourhoods, we see a richer 
and subtler definition of its boundaries. 




The first outline maps roughly to the extremities of the RER, the commuter 
train that services Paris and the surrounding suburbs. This is a fairly accurate 
representation of the "greater metropolitan" area of Paris, reflected in both popular 
folklore and government administrivia as more and more people shift from rural to 
urban living. The rest, taken as a whole, follows closer to the shape of the old city 
gates that most people think of when asked to imagine Paris. Which one is right? 
Both, obviously! 



Cities long ago stopped being defined by the walls that surround(ed) them. 
There is probably no better place in the world to see this than Barcelona which first 
burst out of its Old City with the construction of the Eixample at the end of the 19th 
century, and then again, after the wars of the 20th century, pushed further out 
towards the hills and rivers that surround it. 




There are lots of reasons to criticize urban sprawl as a phenomenon, but 
sprawl, too, is still made of people who over time inherit, share and shape the 
history and geography they live in. Whether it's Paris, Los Angeles, William 
Gibson's dystopic "Boston- Atlanta Metropolitan Axis" (BAMA) or the San 
Francisco "Bay Area," they all encompass wildly different communities whose 
inhabitants, in spite of the grievances harboured towards one another, often feel as 
much of a connection to the larger whole as they do to whatever neighbourhood, 
suburb or village they spend their days and nights in. 



That's one reason it's so interesting to look at the shape of cities and see 
how they spill out beyond the boundaries of traditional maps and travel guides. In 
the example above, the shape for Paris completely engulfs the commune of Orly, 20 
kilometers to the South of central Paris: this makes a certain amount of sense. It also 
contains Orly airport which isn't that notable except that Flickr treats airports as 
though they were cities in their own right; the realities of contemporary travel mean 
that airports have evolved from being simple gateways to capital-P places with their 
own culture, norms and gravity. So, now you have cities contained within cities 
which most people would tell you are just neighbourhoods. 



mmmmm....donuts 
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As of this writing, we've finished rendering the third batch of shapes for the 
corpus of places in the Flickr database and looking ahead are wondering whether we 
should also be rendering shapes based on the "relationship" of one place to another. 
Rendering the shape of the child places for a city or a country would allow you to 
see a city's "center" but also provide a way to filter out parts of a shape with low 
Earthiness (aka water) quotient, typically countries. 

The issue is not to prevent, or correct, shapes that provide a false view, 
because I don't think they do. Schuyler Erie, developer of the Clustr application, 
observed while we were getting all this stuff to work in the first place and testing 
the neighbourhoods that border the San Francisco Bay that they are really "the 
shapes of people looking at the city". They are each different, but the same. 



But maybe we should also map the neighbourhoods that aren't considered 
the immediate children of a city but which overlap its boundaries. What if you could 
call an API method to return the list or the shape of a place's "cousins"? What could 
that tell us about a place? 
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"The "long here" that Flickr represents back to me is becoming only 
more fascinating and precious as geolocation starts to help me 
understand how I identify and relate to place. The fact that Flickr' s 
mapping is now starting to relate location to me the best it can in human 
place terms is fascinating ... but where it falls down it falls down 
gracefully, inviting corrections and perhaps starting conversation." — 
Matt Jones 
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We could have released these shapes before the corrections project, but then 
it would have been little more than a closed cycle, where our misinterpretations of 
place were relayed back to our data provider and so on. By giving users the ability 
to signal their interpretation of place, we not only break the feedback loop, but also 
provide a way for those corrections to be fed back in to Flickr's reverse-geocoding 
engine to better geotag photos in the future: we use the wisdom of the community to 
give shape and nuance, and voice, to the authority of the dataset that we are working 
from. 



As with any visualization of aggregate data, there are likely to be areas of 
contention. One of the reasons we're excited to make the data, via the Flickr 
Application Programming Interface (API), available is that much of it simply isn't 
available anywhere under a non-commercial license, and the users and the 
developer community who make up Flickr have a gift for building magic on top of 
the API so we're doubly-excited to see what people do with it. 
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Clustr, the software used to generate shapes, was released under an open 
source software license and is designed to work with any set of latitudinal and 
longitudinal derived points. In the future we hope to add a feature to assign an 
abstract weighting to any individual point to affect how it is interpreted by the 
application. For Flickr, this weighting might be whether or not its associated photo 
was corrected or whether the location was offered as a suggestion by another user. 
Another limiting agent might be whether a photo was geotagged by a user who 
could be considered a resident of that place, rather than a tourist or visitor. 



But more than that, we hope other projects will start to map the shape of 
their projects and share them with the wider community. 



"The (Minnesota Historical) Society 

has the largest collection in the 

universe of Minnesota fiction and 

many of these books create thinly 

veiled places based on the author's 

experience with an authentic local 

place." 



What are the shapes of user-defined places? The first place you kissed your 
spouse? Napoleon's march in, and then back out, of Russia? Your daily commute? 
Does the shape of New York City's "ground zero" extend beyond the city blocks 
excavated after the World Trade Center towers fell, to the places that people ran to, 
or to the vantage point from which a person saw events unfold? 



The Massachusetts Institute of Technology's SENSEable Cities project has 
been researching and visualizing the movement of tourists in Barcelona through the 
photos they've posted to Flickr, since 2007. What would it mean, not simply to plot 
those photos as a cloud of isolated events, but to give them shape, and meaning, as 
entirely new neighbourhoods or temporary cities in time, like Black Rock City 
which seems to emerge fully-formed out of the Nevada desert for the annual 
Burning Man event, only to disappear and "leave no trace" (except, as it happens for 
a lot of geotagged photos) ten days later? 




Stamen Design's Oakland Crimespotting is an interactive map for visualizing 
and understanding crimes in the city of Oakland. By filtering incident reports by 
date and type, a viewer is able to see both the shape of criminal behaviour in the city 
and also the shape the city's response to it; for example, the seemingly clockwork 
intervals between no activity in a neighbourhood followed by nearly block-to-block 
reports of prostitution arrests. What is the shape of the history unseen in a place? 



"Over 9, 000 Londoners lost their lives to V2 rocket strikes in World War 
2," writes Tom Taylor, creator of the rocketstrikes.iamnear.net Web site. "Below are 
the five ... rocket strike locations" nearest to Westminster. Financial institutions and 
lenders may want to pattern the world with spending habits and agency, but I'd like 
to use the same tools to see the pattern of nearby. We all have friends who've sat in 
the same seat on the same airplane flying back and forth between destinations, only 
to have a third friend, years later with the aid of the Internet and a GPS -enabled 
device, see a photo of that seat and realize they are sitting only meters away. Not on 
the same plane, but in the same physical space that both planes occupied and the 
same place, the same anti-space, that anyone seated on a plane waiting to taxi from 
the terminal to the runway inhabits. 
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Hammock of interpretation 




You are here, so say all the maps. 




Place is history, and if the Internet is even half the "architecture of 
participation" that its supporter claim, then maybe history need no longer be written 
by the victors alone. Given the chance, what are the dinner-time, war-time and 
drunken kitchen-party stories that the places we have known would tell? 



What would they name? 



See also: the short 

Version http://www.flickr.com/photos/straup/34472 842 67/ 



2009-04- 18T10:27: 18-0400 
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"Where place matters more and space matters less..." 

Benjamin 
Bratton http : / /www . cityof sound . com/blog/ 2 00 9 /04 /ben j amin-h- 

brattton-postopolis-la . html 



I will be at Museums and the 

Web http://www.archimuse.com/mw2 00 9/ , this week, to talk about the work 
we've been doing at Flickr around geotagging photos, reverse-geocoding and 

Shapefiles http: //code. flickr.com/blog/2 008/10/30/the-shape-of -alpha/ 

and more broadly notions of bias in and the interpretation of place. Plus, I get to 

speak alongside the Philly 

History http://www.phillyhistory.org/PhotoArchive/ crew which is extra- 

exciting! 
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I'm other thing that I'm excited about is being to talk about how 



CluStr http://code.flickr.com/svn/trunk/clustr/ , the open source tool we 

use to generate shapefiles, is now bundled as part of the Maps From 

Scratch http://www.mapsfromscratch.com/ Amazon EC2 AMI. There's a long 

and detailed blog post about all that on the codcflickr 

blog http://code.flickr.com/blog/2009/04/07/the-only-question-left- 
is/ but the short version is: 



"We expressly chose to make Clustr an open-source project to share 
some of the tools we've developed with the community but it has also 
always had a relatively high barrier to entry. Building and configuring a 
Unix machine is often more that most people are interested in, let alone 
compiling big and complicated maths libraries from scratch. Clustr on 
EC2 is not a magic pony factory but hopefully it will make the 
application a little friendlier." 



In that post I talked about wanting to be able to use Clustr by calling a 
simple web service so eventually I wrote the quickest and dirtiest implementation I 
could think of: a PHP script that simply shells out to the Clustr application and then 
returns the output (compressed). I encourage anyone who wants to get hung up on 
the lack of "elegance" in that approach to port CGAL http : //cgai . org/ to 
PHP. Your efforts will be amply rewarded, I'm sure, but in the meantime this 
already works: 

$> curl -H ' x-clustr-alpha:0. 00001 ' -v — data-binary ' @/path/to/points.txt ' \ 

http: //ec2-xxxxxxxx.compute-l .amazonaws .com/ws-clustr/ > ~/path/to/shapef ile. tar . gz 

WS-clustr.php http://github.com/straup/ws-clustr/tree/master is 
available for anyone to download on GitHub, along with a handy README file for 
getting it to work with the Maps From Scratch 

AMI http://github.com/straup/ws- 

clustr /blob/2 163ea6a8ad4bfce9405eb8a0dc3d6cda6ad7d35 /README. mapsf romscral 

Which is all good but you still need something to make shapes of. How about all the 
geotagged photos uploaded to Flickr on March 24, 2009: 

$> python f lickr-tools/geotagged. f orday.py -c /path/to/f lickr .cf g -d '2009-03-24' — clustr 



That yields a file with 54, 673 points that I can ask ws-clustr to plot. By 
passing those points to ws-clustr with a variety of alpha sizes (1 1 times to be 
exact) I was able to generate the following image in 

QGIS http://www.qgis.org/ : 




The geotagged . f or_day . py script is one of several Flickr related 
helper tools available for download on Github as part of the flickr- 

tOOlS http://github.com/straup/flickr-tools package. 

So now what? Or rather: What if my mapfromscratch/ws-clustr AMI isn't 
already up and running and I want to generate hawt shapefile action? EC2 servers 
are great for doing short-fast tasks but if left running for days or weeks on end starts 
to incur noticeable fees. Fortunately, starting and stopping EC2 can be done 
programatically so I wrote a client-side interface, in Python, to (ws) Clustr that 
starts a new EC2 instance, exchanges a points file for a (compressed) shapefile and 
then shuts the server down again. The code also checks to see if there is already a 
running instance of the AMI you want to use and simply uses that one if available. 

Like this: 



from wsclustr import wsclustr 



wsc = wsclustr (' amzaccesskey ' , ' amzsecretkey ' ) 
wsc . startup ( ' ami-xxxxx ' ) 



while not wsc. ready () : 
time .sleep(5 ) 



shpfile = wsc.clustr( '2009-03-24-geotagged.txt ' ) 

wsc . shutdown( ) 

Which was great, except for the part where I sent the same 1.3MB file across 
the wire 11 times in order to create all the shapefiles for the image above. EC2 is 
pretty cheap as far as these things go but sooner or later all that data and traffic is 
going to add up and Amazon won't hesitate to send you a bill for it. So, now both 
ws-clustr and py-wsclustr support an equally bare-bones caching layer for 
the data the client sends to the server. As far as the Python side of things go, it looks 
and acts like this: 

shpfilel = wsc. clustr( '2009-03-24-geotagged.txt ' , alpha=0.001, try_cache=l) 
shpfile2 = wsc. clustr( ' 2009-03-24-geotagged.txt ' , alpha=0.01, try_cache=l) 
shpfile3 = wsc. clustr( ' 2009-03-24-geotagged.txt ' , alpha=0.1, try_cache=l) 

If the cached version exists on the server then the shapefile will be generated 
using that without the client having to send all that data again. If the cached version 
does not exist then the server will return an HTTP 404 error and the client will re- 
try the request with the data. Caches are stored and referenced with identifiers 
generated from the contents of the data file. Specifically: "clustr-" + the value of 
md5sum(2009-03-24-geotagged.txt). If you look behind the curtain, what's actually 
being sent to the server is something like this: 

$> curl -H 'x-clustr-alpha:0.01 ' -H 'x-clustr-cache: Clustr-c77cae39a4f 7e506a9cc8205176f 1239 ' \ 

http: //ec2-xxxxxxxx.compute-l .amazonaws . com/ws-clustr/ > -/path/to/shapef ile. tar .gz 

The Housekeeping Department would like me to remind you that it is left as 
an exercise to people running their own ws-clustr servers to take care of 
cleaning up their system's temporary directories, where the cache files are stored, 
ws-clustr was built to run on an EC2 instance where it is expected that the 
server, along with all its data, will be torn down long before disk space becomes an 
issue but since it's just a PHP script there's nothing to prevent it from being used 
outside of Amazon's cloud castle. Just something to keep in mind. 



ft 



You know what is ASTRONOMICALLY 
FUCKING EXPENSIVE? Leaving an 
EC2 instance running for two weeks 
doing nothing, by mistake. 

about 1 hour ago from web 




Likewise with caching the output, or supporting something like If-Modified 
tags, which currently isn't done yet for two reasons. The first is that Clustr is just 
Really Fast so I'd rather spend my time solving other problems than caching for 
caching's sake. The second is that there's no (automatic) expectation that the EC2 
server running ws-clustr will ever be running long enough to warrant caching 
shapefiles by their alpha number and the contents of their data. Again, if people start 
to use the server outside of EC2 then it might be warranted but until then there are 
problems better solved sooner. 

Now that you've sucked down shapefiles in Python it would be useful to do 
something with them. I like using Zachary Forest Johnson's 

shpUtils.py http://indiemaps.com/blog/2 008/03/easy-shapefile-loading- 
in-python/ library to do the actual parsing (though the ESRI shapefile 

Spec http: //www. esri.com/library/whitepapers/pdfs/ shapefile.pdf is 

actually pretty simple if you need to write a specialized one-off). Here is some 
sample code to parse a shapefile returned by ws-clustr and munge it in to list of 
Shapely http://gispython.org/shapely/manual.html polygon objects. 
Shapely is useful for doing all sorts of hairy geometry and head-scratchy math but 
the shorter way to think about it is that it's basically Just Awesome. 

The complete code listing is included in the 



examples http://github.com/straup/py- 

Wsclustr/tree/f832 7aal89c3fc5eb848f9a0a5db3a2 6c417 90a5/examples 

directory of the py-wsclustr project on GitHub. 



t = tarf ile.open( shpf ile) 
t . extractall ( ) 



# Because the tarf ile. getnames method always seems 

# return the list of files in random order... 



shp = shpf ile. replace { " .tar . gz" 
shp = "%s/%s.shp" % (shp, shp) 



import shpUtils 

from shapely .geometry import Polygon 



polys 



for record in shpUtils .loadShapef ile(shp) 
for part in record [' shp_data '][ 'parts ' 



poly 



[] 



for pt in part [' points ' ] : 

if pt .has_key ( ' x ' ) and pt .has_key( ' y ' ) 
poly.append((pt[ 'x' ], pt['y'])) 

poly = tuple (poly) 
p = Polygon (poly) 
polys . append (p) 
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Or, if you're like me you'll want to display all those shapes using 
ModestMaps http : / /www . modestmaps . com/ . Here is the code used to generate 
the image below, modulo the part where the modestMMarkers package is not 
public yet. This is code still under active development to display the turkishMMap 
(remember 

that? http: //www. aaronland. inf o/weblog/2009/01/24/meat/#papercamp- 

parti ) cluster-y bits but that's not really the point. The point is that there are now 
a few more "nubby bits" in the toolbox with which to build things. I happen to have 
a bit of a map fetish. 



alphas = (100, 25, 10, 5, 1, .1, .01, .05, .001, .0005) 

swlat = None 
swlon = None 
nelat = None 
nelon = None 

shapes = [ ] 

for a in alphas : 

shpf ile = clustr .clustr ( ' 2009-03-24-geotagged.txt ' , alpha=a, try_cache=True) 

t = tarf ile. open( shpf ile) 
t.extractall( ) 

shp = shpf ile. replace ( " .tar .gz" , " " ) 
shp = "%s/%s.shp" % (shp, shp) 

records = shpUtils . loadShapef ile ( shp) 
polys = [ ] 

for record in records : 

# this is a bit redundant since it only 

# needs to be calculated once but you get 

# the idea. . . 

data = record [' shp_data' ] 

if not swlat : 

swlat = data[ 'ymin ' ] 
else : 

swlat = min( swlat, data [' ymin '] ) 

if not swlon : 

swlon = data[ ' xrain' ] 
else : 

swlon = min( swlon, data[ 'xmin ' ] ) 

if not nelat : 

nelat = data['ymax'] 
else : 

nelat = maxfnelat, data[ ' ymax ' ] ) 

if not nelon : 

nelon = data['xraax'] 
else : 

nelon = maxfnelon, data[ ' xmax ' ] ) 

for part in record [' shp_data' ][ 'parts ' ] : 

poly = [ ] 

for pt in part [' points ' ] : 

if pt .has_key ( ' x ' ) and pt .has_key( 'y ' ) : 

poly. append ( { ' longitude ' :pt [ ' x ' ] , ' latitude ' :pt [ ' y ' ] } ) 

polys . append ( poly ) 

shapes . append{polys ) 

w = 6000 
h = 4000 

pr = ModestMaps.builtinProviders[ ' BLUE_MARBLE ' ] ( ) 
sw = ModestMaps .Geo. Location ( swlat , swlon) 
ne = ModestMaps .Geo. Location (nelat , nelon) 
dims = ModestMaps. Core. Point (w, h) 

nun_obj = ModestMaps .mapByExtent (pr, sw, ne, dims) 
map_img = mm_ob j .draw( ) 

shp_img = PIL. Image. new (' RGBA' , (w, h), 'white') 

# Hey look! This is modestMMarkers.py; it has not been released yet!! 

poly = modestMMarkers. polylines .polyline(mm_obj ) 

for polys in shapes : 

shp_img = poly .draw_polylines ( shp_img, polys, color=( , 0, ) ) 



mask = shpimg. convert ( ' L ' ) 



enh = PIL. ImageEnhance. Contrast (mask) 
mask = enh. enhance (2 . 5 ) 



mask = PIL. ImageChops. invert (mask) 



= PIL. Image, new ( 'RGBA' , (w, h), 'white') 
. paste (mapimg, (0, 0), mask) 




No, really. 

Like everything else, py-wsclustr http://github.com/straup/py- 
wsciustr/ is available for anyone to play with on the GitHub. At some point in 
the near future I will make sure that all these packages are also given a home on 
aaronland.info http://www.aaroniand.info/ , filed under Just In Case. 



As an aside, I finally made my peace with EC2 and Amazon on the grounds 
that, at the end of the day, it's just a plain old Unix box with tailored build 
instructions that can be backed up and re-created like any other server and if you're 
not already backing up your machines then you've got bigger problems than 
whether or not Jeff Bezos wants all your base. Compare this to Google's AppEngine 
which looks really interesting but for "some" reason requires that you give them 
your fucking phone number to sign up for a developer's account. It's like a whole 
new and perverted twist on the 
honeypot http://en.wikipedia.org/wiki/Honeypot_(computing) some days... 



Meanwhile, come May I will be speaking about 

Clustr http://en.oreilly.com/where2009/public/schedule/detail/7212 and 
shapefiles and "communities of authority" at Where 

2.0 http://en.oreiiiy.com/where2009 , in San Jose. In the talk-is-cheap- 
always-try-to-have-working-code department I had sort of imagined not being able 
to get to the HTTP client libraries for Clustr working so soon; now I'll just have to 
dream up something new to share with people ! If you've been thinking about 
attending but needed a little more coaxing the nice folks at O'Reilly have given me a 
25% discount code (for the registration fee) to pass along: WHR09FSP. 

In July, I am looking forward to returning to Vancouver and speaking at 
GeoWeb 2009 http://geowebconference.org/ about the idea of 

nearby http://biog.fiickr.net/en/2009/02/24/an-abundant-present/ , and 

history boxes http://www.slideshare.net/straup/history-boxes- 

presentation and trying to encourage a more nuanced understanding of place that 
can be read and traveled like a contour map of meaning. Or something like that. 
There's a lot of twisty in that one so I am pleased to have the chance to try and give 
a little more form to the idea. Indeed, there are still long and twisty blog posts about 
nearby and history boxes and the importance of artifacts and the 
Papernet http://www.fiickr.com/photos/bopuc/3435085658/ to be written, 
each of which will surely feed the talk. 

But not tonight. 
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painting the bike shed in yak hair 

24 sheets of PPPPPaper 



24 sheets of PPPPPaper 




/ don't even like the Turkish map 

fold http://www.geocities.com/david_j_rosen/mapfold/ , to be honest . Also , 

this bio g post is too long. And full of yaks. 

RAWR! http://www.flickr.com/photos/mylesdgrant/34 09555405/ 

Dave http : / /communicationnation . blogspot .com/ first showed it to 
Matt http://magicalnihilism.wordpress.com/ , I think, who showed it 

George http : //abitof george . com/ and they all showed it to me as an 
alternative to the PocketMod http : / /www . pocketmod . com/ layout which is what 
I'd been using to build Papernet http://www.aaronland.info/papernet 
prototypes. As I said, I don't like it very much: You have to first tear or cut a sheet 
of paper in to a square and then there's a lot of weird folding. It's no easier to 
explain than the funny press-fold-turn dance step for making PocketMod books and 
in the end I just have something with an irregular shape and a pointy bottom that 
needs to be folded (again) before I can put it in my pocket. 



But, what the hell. I had just finished the ill-fated experiment with the 
pOCketMMap http://www.aaronland.info/weblog/2008/ll/2 7/time/#hills 

books at DesignEngaged 

(DE08) http : / /www. designengaged . com/ arc hive 2 08/2008/10 /present at ionsp 



I had done a quick and dirty experiment using the same code to render the Dopplr 
places (Atom) feed for London only to discover that it spanned 24 sheets of paper 
because while most of the points of interest were in central London the rest were 
forever and beyond in places like 

Heathrow http://www.fiickr.com/piaces/LHR . By then, I was open to 
George's suggestion that maybe a reasonable design 

Constraint http://www.flickr.com/photos/george/sets/72157 609615207103/ 

was to limit both the number of points listed and the area that they covered. 

Strictly speaking I don't actually think that's a useful constraint because it 
sort of hand waves the problem (that you can't always "best of" away a perfectly 
reasonable collection of densely clustered points) but it did seem like an interesting 
way to approach the problem. The code that generated the DE08 
pOCketMMapS http : / /www. aaronland . inf o/weblog/2 08 / 1 1 /2 7 /time/#hills 
was pretty naive in its approach: It simply created a bounding box based on the 
outlier points and drew a big rectangular map on to which restaurants and hotels 

and funny Stories http://maps.google.com/maps/ms? 
ie=UTF8&hl=en&msa=0&msid=106 6700487592008 8 1360. 0004565744030ff07d00e&ll= 

were plotted. In practice this meant that at least half the map surface was negative 
space, devoid of any markers, because 90% of the points formed a long cresent that 
hugged just one side of Mont 

Royal http://www.flickr.eom/places/CA/QC/Montreal#montroyal . 

Lots of empty map space makes for lots of pages which makes for lots of 
fussing folding paper which makes for no fun. 

I have seen very large Turkish maps that can be folded into the size of a 
cigarette pack and I have seen people glue many smaller ones together to form a 
book but those are capital-A acitivities and all I've ever wanted is something I could 
produce quickly and cheaply and shove in my pocket on the way out the door. I 
drank the kool-aid along with everyone else and I happen to be excited about the 
magic digital sensor world in my 

pocket http://liftlab.com/think/nova/2008/10/ll/design-engaged-2008- 
my-notes/ but I also want something to fall back on when the computers fuck up, 
can't find the network or (more likely) run out of power. 



Right now that's consumer-grade printers that print on letter-sized sheets of 
paper. And, in the case of Turkish maps, trying to squeeze 80+ points ranging a 
distance of everal square miles in to a single map view eight or inches across was a 
non-starter. 

So, working off of George's suggestion of creating smaller and more 
intimate story-telling maps, with only a handful of touchstones and enough room for 
a person to discover a place, why not try to recognize clusters based on the 
proximity of one location to another and sort everything into groupings small 
enough to fit on a single sheet of paper? Remember the 32-page Word document, 
emailed and printed out, of restaurants and bars in Paris organized by neighbourood 
that was one of the sparks for the whole Papernet 

dance http://ww.aaronland.info/weblog/2 006/12/17/meat/#papernet ? 
Yeah, like that. 




Which is what I'd really hoped to have finished in time for 
PaperCamp http : / /bookcamp . pbworks . com/PaperCamp . It seemed straight- 
forward enough, at the time, which is a polite way of saying that's when the yak 
shaving started. Also, printers lie. 



But I digress. 

Most of this code gets written in a hurry, in the morning over coffee with an 
eye towards solving the particular task at hand rather building a possibility-space- 
elevator, and that was truer still of the code to make map-things for DE08. Which is 
fine and the price of working that way is that sometimes you need to go back and 
refactor everything to play nicely with a new idea. I'm okay with this largely on the 
grounds that this kind of fussing and nit-picking, at least in nerd/programmer 
circles, happens no matter what you do or how you start so there's not a lot of point 
in pretending otherwise at the outset. 

And sometimes you get stuck solving a problem that seemed to make sense 
at the time but doesn't really apply anymore or is simply at cross purposes. Back 
when I was working on the pocketMMap code I generated the index (or table of 
contents) at the back as an image rather than proper inline vector-based text data. At 
the time this seemed like the easy and clever thing to do, though I've since realized 
that Report Lab can do what I need it to in pure PDF, and it proved useful because a 
day or two before DE08 was set to start we decided to print big maps, to hang on 
the wall, with big honking markers for each point of interest. So I needed something 
to render text in a box with a fixed size as an image. 

Which was, and is, kind of a nuisance; the sort of endless fussing over 
details that makes print such a chore and makes people run for the warm embrace 
and relative safety of HTML and CSS . I get it. I really do. But at least now I can add 
simple (very simple) chunks of text to ws- 

modestmaps http://modestmaps.com/examples-python-ws/ . Which was 

actually the point, just a different one. 




So, that was the first stumbling block: Trying to finish an abstracted set of 
"draw me some rasterized text, in this a box this size" functions and then shoe- 
horning them into a capital-P print project. It's one thing to write code from scratch 
to just work for the task at hand and it's quite another to try and write code that can 
predict the future. 

As an aside, I learned during the four short days I worked in the fast food 
industry that this is called "stage and project" or, alternately, capacity planning for 
bad habits. I worked in a big-chain burger joint where the trick was to always have 
enough burgers cooking at any one time that an order could be served withing 15 
seconds of having been placed. The catch though was that we were supposed to also 
have enough burgers on the grill that got overcooked so that we could use them to 
make chili. 



The second problem is that I tried to rationalize the code that generates (so- 
called) pocketMMaps and turkishMMaps too soon. There is now a shared library 
for the two packages that is classic kitchen-sink code with debatable subclasses and 
naming conventions that started out as an wrapper for drawing polylines and other 
markers on top of py-modestmaps http://www.modestmaps.com/ derived 
images and somehow ended up containing feed parsers. Because, you know, that's 



where the markers came from. Yeah, I know... 

The road to Hell is paved with abstract intentions. 

So, that's the lesson for me. Not that there shouldn't be a proper code cleanup 
but that there was nothing gained from doing it now. It would have been cleaner and 
faster and easier to use the dreaded lib_copy_and_paste and clone entire 
chunks of code and made the effort to leave notes and pointers and comments to 
help refactor things when the dust had settled and not a moment before. 

In the end it all works. It is really not pretty to see what's going on behind 
the scenes and I am less convinced than ever that it's worth .... But it works. 
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"Printers lie." That's what Matt 

Jones http://www.flickr.com/photos/straup/2328347447/ said to me when 

we were commiserating over the pain involved in trying to wrestle with printer 
margins and bleeds and the like. There's a reason everyone got so excited about the 
web: It's not print, which is full of grues and demons. (The same reason, frankly, I 
still prefer drawing maps by hand really...) In my case, I wanted to overlay guide 



lines, for cropping and folding the paper, because after all it was a Turkish map and 
that seemed like a useful addition. In my case it was made worse by the need to play 
stupid tricks with the over-sized page dimensions and image resolution and telling 
printer drivers to scale to fit in order to make the rendered map tiles crisp enough 
for viewing and large enough to include markers without succumbing to red-dot 
fever. 



But, here's the thing about this approach: It sucks . 

It is a situation that's gradually getting better but until recently the only 
alternative, if you wanted to do this stuff programatically , was to write stuff in 
PostScript (or LaTex, if you're Blaine http://www.romeda.org/ ) which is 
basically pure buzz-kill for any project. I take a some small perverse joy in seeing 

that all the work the XSL-FO http://www.dpawson.co.uk/xsl/sect3/ 



community did, and the pain I endured to generate index-card sized (duh duh 
duh) printed 

recipes http: //eatdrinkf eelgood. inf o/tools/xsl/eatdrinkf eelgood-1 . 1-to- 

indexcard-f o/ was actually the right approach. I say that having only just 
recently discovered that work has begun anew on 

FOP http://xmigraphics.apache.org/fop/ , the only serious open-source 
XSL-FO processor available. 

This is a Very Good Thing because XSL-FO is designed to embed 

SVG http://www.zvon.org/HowTo/0utput/howto_jj_svg_17.html which 

suddenly means that generating printed maps, whether it's using something like 
Cloudmade's decidely alpha SVG 

tiles http: //developers .cloudmade .com/projects /show/ vector- tiles or 

simply baking SVG maps using 

Mapnik http://trac.mapnik.org/wiki/MapnikRenderers , is actually well... 

possible enough to be considered easy. 

There are also, it's true, countless HTML to PDF style converters out there 
but if you stop and think about it they are all just XSL-FO processors without 500- 
odd years of lessons learned and gotchas from the print world. This is not 
necessarily a bad thing depending on the scope of your project but, really, I digress. 




If the original pocketMMap exercise was completed in order to figure out 
where it would fail, this was an exercise in trying to figure out how to start making 
it better without worrying too much about the shiny. 

The basic design of the TurkishMMap was a square map cropped on its y 
axis enough to fit a legend containing the markers listed on the map and a throw- 
away gutter at the top of the page. It's a Turkish map, after all, so the sheet will need 
to be torn to form a square. Which made trying to get all the margins to line and be 
equally spaced on all four sides of the map ... a waste of time, really. 

So, eventually I stopped bothering. With the margins and the fold lines. 
Really with the Turkish map part entirely. You can still do all of that if you want but 
what I started to realize staring at all my failed print-outs was, once the automagic 
clustering of places had been done and any one sheet of paper only had about ten 
items listed that I liked the layout as is. Unfolded. 




Or rather, I didn't really care how it was folded if at all. As documents that 
are scoped to a bunch of places all relatively close to one another I can imagine 
printing them all out in advance of a trip (for example) and then, just like when we 
were in Paris in 2006, simply the grabbing two or three sheets that I think I might 
need during the day and shoving them in my pocket on the way out the door. 

I kept the large gutter at the top of each page for those people who really 
want to make a proper Turkish map but also used the space to include a zoomed out 
map of the same area with the bounding box of the larger map highlighted to give 
things a little more context. 




Which is interesting because you start to hear echoes of the original 
Papernet mockups for recipes and 

wine http://www.aaronland.info/weblog/2006/12/17/meat/#papernet not 

that I think it's necessarily any more useful. As much as I love all the sexy folding 
and magic books that appear before your eyes the trouble they require to make 



seems to be inversely proportional to the value people place on them as a thing. It's 
far from a golden rule but generally things become artifacts in their use rather than 
their making (or "configuring") and in that light it's still pretty hard to beat a single 
printed sheet of paper and scribbled notes and other scraps of paper that begin to 
orbit each other over time. 

It doesn't necessarily make for great sharing, in the ways that we've come to 
expect from hanging out on the Internet for a decade, but it's worth noting that I live 
and breath this stuff and still we take the same 32 pages of "stuff to do in Paris" that 
we first printed out in 2006 every time we visit. 

One of the last people to speak at PaperCamp, in London, was Beeker 
Northam http : / /beeker . typepad . com/ and she did a short talk about her love 
of books and in particular individual pages in books. Jeremy did a good job 
describing what she said next so I'll just quote 

him http://adactio.com/journai/i546 : "She photographs her books. There's 
something about photographing them that's different to scanning them. She'd like to 
have some kind of web-based way for people to share those bits of books that have 
had an emotional impact on them but she hasn't found it yet." 

This struck a chord with me for a couple reasons. First, I'd kind of like to get 
back to working on the web-based end of things for a while. I've spent about a year 
working on the output and formatting end of things and it feels like it's time to work 
on some of the tools for actually creating things; maybe giving the long neglected 
deliciousmaps http://www.aaronland.info/www/deliciousmaps/ project some 
love. Second, because it was so god damn simple and simple usually wins. I am as 
guilty as anyone of fetishizing the separation of form and content but maybe, just 
maybe, it's not such a big deal. At least not to start with. 

Maybe it's good enough to just assume people will scribble maps on the 
backs of napkins and eventually get around to uploading them to some place where 
they can be shared, where shared mostly means dragging and dropping on to some 
sort of "canvas" to be (re) printed. Rinse and repeat. Somewhere further down the 
eventuality stream those same maps could be traced in the same way raw GPS 
traces are merged in to Open Street 
Maps http://www.openstreetmap.org/traces , for example. Sooner or later 



someone will go to the trouble of sorting, arranging and rectifying all the data-bits 
properly and then the tools for automagically creating new things will be even better 
but that shouldn't also prevent people from doing the quick and simple thing. 

I think Mike's "walking papers" 

project http : / /mike . teczno . com/notes /walking-papers . html and Schuyler 's 

work building the NYPL map 

rectifier http : / /mappinghacks . com/2 09/04/2 /talks-on-the-research-web- 
and-on-sms-in-the-deveioping-worid/ are important in this regard. They are 
the bridge -pieces that let people work outside the normally formal and tedious 
constraints of software while providing a way to get all that data back into a 
structured system. Which is pretty awesome, but I digress again. 

The short version: It's still not possible to generate paginated pocketMMaps. 
Yet. In the meantime, here's what you can do: 



import turkishMMap 



This is the code that can, for example, read in an GeoRSS (or Atom) feed 

containing 80-Odd points http : / /maps . google . com/maps /ms ? 
ie=UTF8&hl=en&t=h&oe=UTF8&msa=OS,output=georss&msid=1066 7004 8 7592 0088 1360 

and generate a 14-page PDF file with a separate page for the 12 distinct clusterings 
in to which those points were sorted, a finishing "oddballs and orphans" page for 
places that didn't fit anywhere else and a cover page listing all those pages and their 
coverage area. The only difference from the 

pOCketMMap http://aaronland.info/python/pocketMMap/ code is that you 
pass in an extra paginated=True argument: 

from turkishMMap. providers import GeoRSS 

tm = GeoRSS (8.5, 11) 

tm.load_provider( ' OPENSTREETMAP ' ) 

tm.drawf eed( 'http://example.com/points.rss' , paginated=True ) 

tm.save( 'DE08.pdf ) 

You can download a copy of the PDF 
file /webiog/2009/05/02/yakshed/DE08-tm.pdf to see the whole thing, in its 
clunky rasterized glory, but here are some sample images: 



import modestMMarkers 

Dear god, help me, this is the kitchen sink library I mentioned above. It's 
really bad. Really. Bad. The good news is that there is only one public interface, for 
drawing polylines and markers, and that's not going anywhere. The gooder news is 
that the rest will eventually be moved in to a generic toolbox package called 
something equally stupid like modesTToolbox. 

The marker stuff is actually pretty useful, though. The super-quick example 
looks like this: 



from modestMMarkers import polylines 

mm_obj = ModestHaps .mapByExtent (provider , sw, ne, dims) 
mmimg = mmobj .draw( ) 

poly = polylines .polyline(mm_obj ) 

mm_img = poly.draw_polylines (mm_img, polys) 

If you follow this link, I've included example 

Code /weblog/2009/05/02/yakshed/flickr_shapes.py.txt that will fetch and 
render the shape data (derived from geotagged photos on Flickr) for a given WOE 
ID as well as all the child WOE IDs contained within, and plotting the whole on a 
handy background map. When you run it you'll end up with something like this: 




import clusterMMap 
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This starts off as plain old K-Means http : / /bonsai . ims . u- 

tokyo.ac.jp/-mdehoon/software/cluster/software.htm clustering where, in 
the absence of a user-defined value of K, the square-root of the number of markers 
is used. After the initial clustering is done the results are filtered to prevent any 
single cluster from containing more than nine points (that should probably be 
configurable). Any that exceed the limit continue to be re-crunched (with a K value 
of 2) until they meet expectations and even then they are subject to an additional 
distance test to ensure that outliers don't get grouped with something that's actually 
too far away. That may mean that they end up as "orphan" points but there are 
hoops to try and account for that too. 



cl = clusterMMap. clusterMMap ( ) 

(clusters, orphaned) = cl .clusters (points ) 

The clusters are then further simplified by generating a convex hull, or more 
precisely tested to see if there are enough points to create a hull. If not, all the points 
in that clustered are treated as orphans. So, now we have a bunch of polylines and 
orphan points. If the bounding box for any one polyline contains the bounding box 



for another polyline or the bounding box of polyline x intersects polyline y (the 
actual polyline, not the bounding box) then the two are merged. Finally, each 
orphaned point is tested to see whether it is contained by any of the bounding boxes 
for the remaining polylines (and added to that cluster if it does). 

Is all that work really necessary? I don't know, but it seems to work so I'll 
keep poking at it until it doesn't or someone offers a more compelling cluebat: 




I've been known to talk about "false starts" a lot these days. I do that as 
much as anything to remind myself why I spend six months holed up on these kind 
of projects. I'm not really disappointed. Too much, anyway. I would prefer to have 
come out of this round with something a little more polished, but at the same time I 
try to remember that's part of the process. In the end I have some useful pieces of 
code that I can use elsewhere, a working prototype which is always better to help 
understand what to do next and maybe a little bit of time to do something else for a 
while. 



The code itself is hosted on 
aaronland http://www.aaroniand.info/python and assuming you've managed 

to install py-cairo http://cairographics.org/pycairo/ and 



ModestMaps http://www.modestmaps.com/ yourself should just magically 
install any other dependencies. (Actually, the clustering code requires 

Shapely http://pypi.python.org/pypi/Shapely and 

Numpy http : / /numpy . scipy . org/ but there are OS-specific packages for those 
too.) I might put the clustering code up on GitHub http: // github.com/straup/ 
but, right now, the rest seem like too much of a moving target to bother. 

• turkishMMap.py 

0.1 http: //www. aaronland. inf o/ python /turkishMMap/ turkishMMap- 
. 1 .tar. gz 

• modestMMarkers .py 

0.1 http: //www. aaronland. inf o/ python /modestMMarkers /modes tMMarkei 
. 1 .tar.gz 

• clusterMMap.py 

0.1 http: //www. aaronland. inf o/python/clus terMMap/clusterMMap- 
. 1 .tar. gz 

What's next? Aside from all the stuff mentioned above? Probably a spy 
novel... 
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Flickr Shapefiles Public Dataset 1.0 



Flickr Shapefiles Public Dataset 1.0 



This is what I said at Where 
2.0 http://en.oreilly.com/where2009/public/schedule/detail/7212 



The Shape of Alpha 



Silicon Roundabout 

Photos and videos taken nearby . 




Aaron Straup Cope Where 2.0 May 2009 



airport poem 



aaronofmontreal 

Aaron Straup Cope 



"The problems they have with labeling and handling contested 
categories is a problem with all categorization systems since the 
world began. Metadata is worldview; sorting is a political act. [...] 
would love to avoid those problems if they could - who needs 
the tsouris? — but they can't. No one gets cataloging "right" in 
any perfect sense, and no algorithm returns the "correct" 
results. We know that, because we see it every day, in every 
large-scale system we use. No set of labels or algorithms solves 
anything once and for all; any working system for showing data 
to the user is a bag of optimizations and tradeoffs that are a lot 
worse than some Platonic ideal, but a lot better than nothing." 
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county / maybe 



"metropolitan area" 



and airports (and "hollywood") 



neighbourhood 



North America / 24865672 



United Steles / 23424977 



California / 2347563 



San Francisco / 12587707 



"San Francisco Bay Area" / "we suck" 



San Francisco / 2487956 



The Mission / 2452334 





Alpha shapes! 



clustr 0.21 - construct polygons from tagged points 
written by Schuyler Erie 

(c) 2007-2009 Yahoo!, Inc. 



de.flickr. 



Usage: clustr [-a <n>] [-p] [-v] <dnput> <output> 
-h, -? this help message 
-v be verbose (default: off) 
-a <n> set alpha value (default: use "optimal" value) 
■p output points to shapefile, Instead of polygons 
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apl = Fllckr.API(...) 

req = Flickr.API.Request(method=f lickr.places.getlnfo', woe_ld=3534) 
res = api.execute_request(req) 



xml = elementtree.ElementTree.parse(res) 
shpfile = xml.find("y/shapedata/url") 
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clustr = wsclustr.ec2(access_key='...\ secret_key='...') 
clustr.startup(ami='ami-4d769124') 



for a In (100, 10, 1, 0.1) : 

shpfile = clustr.clustr(7path/to/points.txt', alpha=a) 



clustr. shutdown() 
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"We should be mapping information that in 

some ways has been historically 

unmappable because it is 1) not valued or 

i 2) actively seen as threatening or is 3) 

simply too hard to map using traditional 

tools." 




And this is what I said on the 
code.flickr http://code.fiickr.com/biog/2009/05/21/fiickr-shapefiies- 

public-dataset-10/ blog: 




The name sort of says it all, really, but here's the short version: 

We are releasing all of the Flickr 

Shapefiles http: //code. flickr.com/blog/2 008/10/30/the-shape-of -alpha/ 

as a single download, available for use under the Creative Commons Zero 

License http://creativecommons.0rg/pubiicdomain/zero/1.o/ . That's fancy- 
talk for "public domain". 



The long version is: 



To the extent possible under law , Flickr has waived all copyright and 

related or neighboring rights to the "Flickr Shapefiles Public Dataset, 

Version 1 .0". This work is published from the United States. While you 

are under no obligation to do so, wherever possible it would be extra- 

super-duper-awesome if you would attribute 

flickr.com http://www.flickr.com/ when using the dataset. 

Thanks! 



We are doing this for a few reasons. 

• We want people (developers, researchers and anyone else who wants to 
play) to find new and interesting ways to use the shapefiles and we 
recognize that, in many cases, this means having access to the entire 
dataset. 

• We want people to feel both comfortable and confident using this data 
in their projects and so we opted for a public domain license so no one 
would have to spend their time wondering about the issue of licensing. 
We also think the work that the Creative 

Commons http://www.creativecommons.org/ crew is doing is 
valuable and important and so we chose to release the shapefiles under 

the CCO http://wiki.creativecommons.org/CCO license as a show 

of support. 

• We want people to create their own shapefiles and to share them so 
that other people (including us!) can find interesting ways to use them. 
We're pretty sure there's something to this "shapefile stuff" even if we 
can 't always put our finger on 

it http://www.flickr.com/photos/junku/sets/303691/ SO if 
publishing the dataset will encourage others to do the same then we 're 
happy to do so. 




The dataset itself is pretty straightforward. It is a single 549MB XML file 
uncompressed (84MB when zipped). The data model is a simple, pared-down 
version of what you can already get via the Flickr 

API http://www.fiickr.com/services/api/ with an emphasis on the shape 
data. 

Everything lives under a single root places element. For example: 



<place woe_id="26 " place_id="BvYpo7abBw" place_type=" locality" place_type_id="7 " label="Arvida, Que 
<shape created="1226804891" alpha="0 . 00015" points="45" edges="15" is_donuthole="0"> 

<polylines bbox= "48. 399932861328, -7 1.2 145767 21 191, 48 . 444801330566, -71 . 157 33337 4023" 
<polyline> 

<! — points go here — > 
</polyline> 
</polylines> 

<shapefile url="http: //farm4. static. flickr.com/3203/shapef iles/26_2008 11 16_082a5655 
</shape> 



and so on --> 



</place> 



Aside from the quirkiness of the shapes themselves, it is worth remembering 
that some of them may just be wrong. We work pretty hard to prevent Undue 
Wronginess from occurring but we've seen it happen in the past and so it would be, 
well, wrong not to acknowledge the possibility. On the other hand we don't think we 
would have gotten this far if it wasn't mostly right but if you see something that 
looks wrong, or weird, please let us 

know http: //tech. groups .yahoo.com/group/yws-flickr/ 

The dataset is available for download, today, from: 

http://www.flickr.eom/services/shapefiles/l.0/ 

The other exciting piece of news is that the Yahoo! 
GeoPlanet http : / /www . ygeobiog .com/ team has also released a public dataset 
of all their WOE IDs http : / /developer . yahoo . com/geo/ that include parent 
IDs, adjacent IDs and aliases (that's just more fancy-talk for "different names for the 
same place") under the Creative Commons Attribution 
License http://creativecommons.0rg/iicenses/by/2.o/ 

Which is pretty awesome, really. 




They've also released the GeoPlanet Placemaker 

API http://developer.yahoo.com/geo/placemaker/ . You feed it a big old 
chunk of free-form text and then "the service identifies places mentioned in text, 
disambiguates those places, and returns unique identifiers (WOEIDs) for each, as 
well as information about how many times the place was found in the text, and 
where in the text it was found." 

Again, Moar 
Awesome http://www.fiickr.com/photos/mbidduiph/2327731497/ . 



And a bit dorky. It's true. The data, all by itself, won't tell a story. It needs 
people and history to make that possible but as you poke around all this stuff don't 
forget the value of having a big giant, and now open, database of unique identifiers 
and what is possible when you use them as a bridge between other things. Without 
WOE IDs we wouldn't have been able to generate the 

Shapefiles http: //code. flickr.com/blog/2 008/10/30/the-shape-of -alpha/ 

or do the Places project http: //www. fiickr.com/piaces/ or provide a way to 

search for photos by place, rather than 

location http://toys.ierdorf.com/archives/49-seiect-from-worid.htmi . 



Enjoy! 



Oh, and those "unidentified" outliers, in New York City, that I mentioned in 
the last post about the donut hole 

Shapefiles http: //code. f lickr.com/blog/2 09/05 /06/the-absence-and-the- 
anchor/ : The Bronx Zoo, Coney Island and Shea Stadium. Of course! 




photos by ajagendorf25, auggie tolosa and the .sky 



Not had for two and a half years work, I guess. 
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Help! I'm being chased by a 
bubblegum machine!! 



That was me, talking (part three) 

That was me, talking (part two) 

That was me, talking (part one) 

This is me, talking 

I Am Here Map (with apologies to Simon) 



That was me, talking (part three) 



Nothing says luxury like airport hummus. I'm sitting here in the Vancouver 
airport http://www.flickr.eom/places/YVR#, recent with hours to kill after 
triggering every single warning signal known to US customs and immigrations 
officials: Arriving with way too much time to kill, being efficient in line and 
generally dismissive of Air Canada staff, carrying a ten-year old passport 
unrecognizable by any machine produced post September 1 1th, wearing a t-shirt 
and probably smelling bad. 

Which is sort of what I imagined speaking at Geo Web 
2009 http : //geowebconf erence . org/ would be like. Looking through the 
conference schedule this spring I wasn't entirely sure why they accepted my 
proposal. Or what sort of reception it would get. 




What follows are my unedited presenter's notes. This talk was delivered at a 
time when I did not write things out in long-form and usually just got up on stage 
and did a sort of stream of consciousness dialog. Sometimes it worked, sometimes it 
didn't. Mostly though it's meant that the ideas just kind of fade in to the mist... 




Hi, my name is Aaron. The short version is that once upon a time I was still 
a painter and then the Internet happened. These days I am the senior engineer at 
Flickr where I work on all the backend stuff for the geotagging project. 

I feel a little bit like that guy because I am that guy but also because this talk 
feels a little like carrying plastic cups full of beer across a crowded room. I mention 
that to remind myself that it's possible but also to say: 




I come in peace. This is a bit of an open-ended talk. I do not have any 
answers. I do not have solutions, strategies or products. Instead I have a ... I want to 
illustrate some of these ideas using stories from our experience ... 



:ecutive Summa 

comics 

dragons 

printmakers 

creation myths 



I promise you, all these things have something to do with the web. And 
geography. 




Nearby 



Earlier this spring we released the "Nearby" project. At it's simplest, Nearby 
is a radial query for geotagged photos near a point whether that point is a coordinate 
entered by hand or associated with a photo. We have enough geotagged photos that, 
in cities anyway, every point becomes a looking glass. 

We did not want a "God's Eye" view. 



Allowing users to construct a narrative based on the relationship between the 
photos, sorted by time and distance. 



"Dots on a map?" 




Nevertheless, one of the questions that's been asked is: Wouldn't this have 
been better or easier as dots on a map. I think the answer is no. 




The comic book artist and writer Scott McCloud has called this the magic in 
the gutter. The gutter is the space between panels in a comic strip, in the between 
the action, whether the reader fills in the story. 



; 




Hammock of interpretation 




This is taken from a series of illustrations that James Bridle did to try and 
understand the conflict in Gaza and the West Bank. ... Room to imagine. 




This isn't anything new really. It's what we've been doing with maps and 
geography for most of our history because, until recently, lacking the tools to 
accurately map the world around us we have ... 




What we can accomplish today is pretty astonishing. 




Now, this is the part where I say bad things about Google Earth and 
Microsoft's PhotoSythn and everyone thinks I'm crazy. I want to make it clear that I 
think both of these tools are amazing. And important. And I'm not going to stand 
here and suggest we go back to a world without them. 

(To be at the center of the map. ..Fuck that!) 

What I am going to say is that the ... betray a desire to create a mirror 
world. ..stitching. ..what's the point. 



I don't want to point fingers at anyone and say they are doing the bad thing. 
I do want to say that we risk doing this without being aware of it. And by doing it 
we paint ourselves in to a corner that we'll more than likely be able to get ourselves 
out of. But life is short and there are better, more interesting problems to work on. 




Recently Google has been adding more and more photos from its users to 
StreetView, in effect "Nearby". What they've also done in the last couple of weeks 
is introduce the element of directionality presumably to create a ooh-shiny ... to 
compete with PhotoSynth. 

And yet, the most interesting thing about the current implementation anyway 
is not the photos but the abstract representation of where the photos are and their 
vantage point. These are the gutters. It's not a criticism of the photos themselves but 
the act of ... fails. 



mV&itL 




It's not that we shouldn't have accurate tools with a high degree of fidelity 
but if you are building tools for a community of users (or viewers) what is gained by 
sucking all the air out of the room. 




If for no other reason than that you limit the potential of your ... Who here 
remembers LineDrive? It was a fantastic research paper published by Microsoft in 
200?? that described an algorithm for rendering driving directions in the manner 
that you or I might scribble them on the back of a map. 

This is one of my favourite maps. It's a good example of the technique of 
comics of juxtaposing highly stylized characters on a realistic and detailed 
background. But the other reason I like it so much is that you could have replaced 
the ... contour of the bay with a single arc and people in San Francisco would have 
understood exactly where the coffee shop is. 



They would have filled in the details of getting to Judah or 46th themselves. 




And there are other ways of marrying accuracy with interpretation. It's a 
really great example of what you can do with [WORDS] data without necessarily 
falling into the trap of creating a mirror world. This is "Here and There" map 
creation by Schulze and Webb, as design agency in London. I've seen ... of similar 
work that AutoDesk is doing. I really like the notion of creation these sorts of 
"proximate spaces", spaces that ... 



♦ 







Last year, we started generating and publishing shapefiles that are creating 
using the only the geodata associated with geotagged photos in Flickr. 

The larger pink shape is the shape of New York City, or rather the 
metropolitain area of New York. It is like a perfect storm of bad assumptions on our 
part and bad data from our providers grafted on to the constantly changing 
assumptions of New Yorker's sense of themselves and the physical space they 
occupy. 

The smaller white and red shapes are the generated from photos whose 
neighbourhoods are children of New York City. A couple interesting things start to 
happen. One, the larger shape in the middle is a whole lot closer to what most 
people think of as New York, encompassing the five boroughs and even parts of 
New Jersey. This may not be the administrative reality but it does mirror facts on 
the ground. 



And those other three smaller satellite shapes? 




The Bronx Zoo. Shea Stadium. Coney Island. I mention this because we 
invest a lot of time and energy into trying to accurately reverse geocode our user's 
photos. If they are going to go to the trouble of ... 




Here's another example: Tokyo. See that smaller shape at the bottom of the 



photo? 




That turns out to be Haneda Airport, the one that most people don't even 
know exists. 




Airports are another example of the flexibility to accommodate different 



models. 
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"... at an airport the individual is defined, not by the tangible ground 
mortgaged into his soul for the next 40 years, but the indeterminate flicker of flight 
numbers trembling on an annunciator screen. We are no longer citizens with civic 
obligations, but passengers for whom all destinations are theoretically open, our 
lightness of baggage mandated by the system. Airports have become a new kind of 
discontinuous city, whose vast populations, measured by annual passenger 
throughputs, are entirely transient, purposeful and, for the most part, happy." 



We have a lot of users who spend a lot of time in airports. This is just one 
user who walked the shape of the airport in Nevada between flights. Technically 
airports are an administrative area or a private enterprise and are indexed 
accordingly. In reality, airports have emerged as a kind of city-state. 
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"History as new media" 



Now, I'm not suggesting that everyone follow our lead. I only want to 
highlight ... All neighbourhoods are disputed at the edges. Once you introduce time 
in to the mix it only gets more complicated. 




Julian Bleecker calls this "design fiction" ... Aaron Koblin 



c) The Street as a Platform , which I'm doing a terrible disservice to, but you 
get the idea. When that person, takes that photo near that building, that 
building should offer them free WiFi, for them to upload that photo to Flickr 
(other photosharing services also exist). But in return (with a click-through 
Terms and Conditions^) it gets to keep a copy of the photo on it's servers, 
in the basement. All large buildings should offer that service. 

One more thing than that however, whenever a building receives a photo, it 
exchanges a copy of it with another building within WiFi/internets reach. So 
the act of a photographer uploading one photo, would put two copies of 
that photo into two buildings, and a copy of a second photo would jump 

buildings. 



Shoeboxes 



Dan Catt has talked about creating places, and histories, by using buildings 
in the city as wireless access points where you pay for use by exchanging 
photographs of that building and it's surrounding. This is a fascinating idea that 
there's just not time enough to go in to. 



'Becoming the territory' 
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Matt Jones described this as the map itself becoming the territory. He was 
talking about a physical map that was produced during each of day of big furniture 
and design fair in Milan using geotagged photos and Twitter messages and other ... 
"using this rapidly-produced thing then becomes a 'social object': creating 
conversations, collecting scribbles, instigating adventures - which then get collected 
and redistributed. ... The Incidental is feedback loop made out of paper and human 
interactions - timebound, situated and circulating in a place." 




March 23, 2009 




Here's where I go right off the deep end and tell you about two meals I've 
had in two different restaurants in the last year. In one I was served a foot long 
"noodle" made of banana. It was half an inch in diameter. At the other I was served 
bacon ice cream for dessert. It's probably obvious that the noodle wasn't very good. 
An interesting idea, probably worth trying once but never again. The other, the ice 
cream, it turns out was just as bad. Bacon ice cream if you've never tried is the 
uncanny valley of food. 




And what I'm really talking about is this. And I'm being unfair to both. The 
Situationists were boring but they weren't wrong though they would choose the 
dragons over the landscape every single time. Likewise, synthetic worlds are 
fantastically useful for all kinds of things ... 




So, we've got these two competing approaches. 



I'm not going to show the Naked City map to illustrate the Situation 
approach. We've all seen it too many times and if you still haven't enjoy the time 
you've got left. As an aside, the Situationists inform most of the current thinking 
behind ubiquitous computing so you'll be seeing it soon enough. 



One of these days, it really wouldn't kill me to do the same talk twice. The 
idea behind the talk is still very much a work in progress and still sort of hard to 
articulate properly. I wanted to talk not about the kinds of things we're building 
around place and geography these days — they're both awesome and 
understandable in an historic context — but rather the stories we tell ourselves 
about what we're building. 

I finished my presentation at State of the 

Map http://www.aaronland.info/weblog/2 009/06/01/bubblegum#sotm09 by 
talking about the, admittedly hand-wavey, idea of what it would mean to use all the 
precision we've accumulated as a blunt instrument. We've collected all this data and 
create all these tools in the service of a faithful representation of the world and what 
would it mean to treat it all not a finely-honed tool but a sledgehammer in the 
service of story-telling and understand the world? 

You would be forgiven for thinking that was just the over-priced airport beer 
talking but this is what I was thinking about when I got here and that's where the 
talk dove in arms-a-flailing... 



2009-07-31T19:03:40-0700 



That was me, talking (part two) 



On Thursday I spoke at the Visual Web 

Meetup http://www.meetup.com/visualweb/calendar/10764407/ , in San 
Francisco. Peter Samis http://www.exhibitfiles.org/peter_samis and 

Susan Chun http://digitaimandaia.net/2009/03/25/susan-chun-aid09/ 
were also presenting current research material from the 

Stevcmuseum http: //www. steve. museum project. We were talking about tags. 
It wasn't my best talk ever partly because I was speaking from the back of the room 
(note to self: when someone offers to be your slide changer say yes) and partly 
because it was the end of the day and I was still working through jet- 
lag http: //www. aaronland.info/weblog/2009/06/01/bubblegum/#sotm09 and 
partly because I did not have any funny picture-slides. 

The last part isn't really true because it's a bit like blaming a camera lens for 
a bad photo and because there is one funny cat picture in the presentation. Anyway, 
I decided to try a minimalist presentation style using only big words and, 
essentially, two colours. When this works it can really lovely like Dave McKean's 
black and white and blue graphic novel 

Cages http: //www. amazon.com/gp/reader/1595823166/ref =sib_dp_ptu#reader- 
link . 

This was not that lovely but people seemed to enjoy the talk. 

What follows are my unedited presenter's notes so apologies in advance for 
incomplete sentences and the like... 



$tag[$tags] = $tags; 



This is an actual part of the Flickr code base. No one can bring themselves to 
change it now. 




My name is Aaron. The shortest possible introduction is that once upon a 
time I was still a painter, and then the Internet happened. 




These days I am a senior engineer at Flickr. We are a small (ish) photo 
sharing website, specializing in pictures of cats. And other things. We are also 
known for the many tags that our users have added to their photos. We have about 
40M unique tags. 



del.icio.us 



We didn't invent tagging. We "borrowed" the idea from Joshua Schachter's 
social bookmarking website del.icio.us. 



keywords 

facets 
topic maps 
categories 
ontologies 



It's important to remember that Joshua didn't invent tagging either. We've 
been chasing systems and forms of classifying information for as long as we've 
been collecting anything worth calling information. 



ponies 



It's been a bit of a wild goose chase really. More than anything, formal 
ontologies outside of so-called domains of expertise are hard to master and, if we're 
being honest about it, boring to use. 



tags 



(good enough Is perfect) 



But del.icio.us offered tangible proof that if you make the process (for 
adding tags) simple enough and provide tools for managing those tags then people 
will participate. Small tools for self-organization. I'll come back to this idea later 
on. We added tags because it provided a fast, cheap and easy way for our users to 
catalog their photos. If that were all tags did, though, they wouldn't be that 
interesting. They also double as a kind of foot-bridge between users and meaning; 
little rabbit-holes of serendipity. 



tag clouds 



One of the earliest tools for managing the volume of tags was a text-based 
visualization called a "tag cloud" where the size of each tag displayed is relative to 
the number of photos associated with it. We're sorry about tag clouds. 



hawt tags 



Eventually we started to experiment with a variety of algorithms for 
detecting new and interesting tags. It's worth noting that there hasn't been a day 
since I've started working at Flickr when "wedding" wasn't the top tag so that 
should tell you something about ranked lists. 



tag clusters 



We also added tag "clusters" which are generated nightly by analyzing the 
entire corpus and feeding them through a variety of hierarchical clustering 
algorithms. Clusters are good serendipity magnets. As a rule, I find the associations 
between the different clusters more interesting than the associations between the set 
of tags in a given cluster. Maybe that's just me. 



tag maps 



In 2006 the Yahoo! Research Berkeley (YRB) team released the tag maps 
project that generated a dynamic, map-driven interface to Flickr photos by 
analyzing their tags for geographic information. We implemented something like 
that, incorporating the work we'd done with hot ("hawt") tags, for the second 
iteration of the Flickr map. More recently, researchers have expanded on the work 
done by YRB and published a really fascinating paper called "Mapping the World's 
Photos". I'm not going to talk about it now but the paper is definitely worth reading. 



machine tags 



In 2007, we added formal support for "machine tags". Machine tags are 
really nothing more than regular tags with a special syntax to denote a faceted 
relationship: a namespace (or a subject domain); a predicate (or a subject topic); and 
a value. Our users had already been adding tags using a machine tag like syntax and 
then parsing out the structure, and the meaning of those tags, themselves using the 
Flickr API. What we added was the ability for Flickr to recognize and index the 
different pieces of a machine tag and to allow users to search for them across the 
entire corpus of photographs accordingly. 



machine ta 



Machine tag extras are we refer to as the process of using the value of a 
machine tag to look up data in another service (as defined by the namespace) and 
squirting that information back in to Flickr. For example we recently added machine 
tag extras support for the Open Library so that when someone tags their photo with 
an Open Library identifier we can display the name of that book. 



wildcard tags 



We also added the ability to query for machine tags as part of a plain old 
URL. For example, if you want to see all the photos that people have taken at places 
to eat in the Dopplr Social Atlas you can just go to 

http: //www. f lickr.com/photos /tags /dopplr :eat= Or all the 
photos taken at Upcoming or Last.fm events: 
http: //www. flickr.com/photos/tags/* :event= 



linked data 



For anyone familiar with the idea of the Semantic Web machine tags might 
look a familiar but somewhat causal implementation or a variation on the theme. 
They are. Machine tags try to provide some of the bridging facilities of the semantic 
web but without forgetting the original lesson that del.icio.us offered: Keep it 
simple. 



commontag 



Recently a project called Common Tag has been launched. It seems to be a 
short form for addressing authoritative topic descriptors in web pages. I haven't 
decided what I think about it. 



tagopedia 



In 2006 Dave Beckett presented a really great paper called "Semantics 
Through the Tag" at XTech, in Amsterdam. One of the ideas Dave proposed was 
setting up a Wikipedia-like site for tags, to document their many meanings and uses. 
He chose Wikipedia specifically because that community has developed lots of 
tools managing conflicts and mechanisms for disambiguating concepts. I think it's a 
great idea. Unfortunately, neither Dave nor I want to actually run the site so if 
someone here wants to take on that responsibility I think it could be a really 
valuable resource over time. 



equivalencies 



That would certainly help managing equivalencies in tags, whether it's 
equivalencies in concepts or just across languages. We don't do anything like that 
right now on Flickr. Personally, I'd like to allow users to define equivalencies 
between tags but we haven't been able to think of a way that would be easy enough 
to warrant doing. 



lexicon 



But really what we have is this fantastic lexicon of terms and connections 
that keeps growing every day. We make a point of trying to expose as much of that 
information as we can via the API and are eager to see someone tease out the shape 
of language on Flickr. That's some of what we've done with things like hot 
("hawt") tags and the clustering but there's still plenty of interesting possibilities to 
explore. 



first class objects 



There is also the question of when and why a tag evolves in to being a first 
class data type and whether that's actually reflected in how people use tags. Dates 
are one example, and geotags another. Each are uniquely indexed in the Flickr 
database and, still, people continue to add both as tags on their photos. The short 
answer, of course, is that it's usually just easier to type 2008 or 2009 than to try and 
remember a specialized syntax for doing searches. 



magic words 



That's the funny thing about language. 



tag as "small horses" 

tag as ponies 



There's been a really interesting discussion on the FlickrCommons group 
around a blog post written by Larry Cebula questioning the limits of user 
contributed tags, notes and comments citing the volume of conversational additions 
like "cool" and "awesome". Another camp argues that the value of user-contributed 
data comes from not simply analyzing the photograph itself but analyzing the 
activity that surrounds photograph a photograph. 



play 



(social objects) 



The photograph is a "social object" around which people can use tags, and 
notes and comments, to have a conversation. The different kinds of metadata are 
devices for shuttling the discussion in a variety of different ways. While the signal 
to noise ratio can often be higher than researchers are used to the contributions from 
the "commons" have also proven to be valuable and rewarding. 



openlibrary:actionshot= 



And the value of play as a motivator shouldn't be underestimated. I 
mentioned earlier that we added machine tag "extra" support for the Open Library. 
This prompted one user to ask (the Open Library staff) whether they could, or 
should, tag a photograph of themselves reading a book, rather than the cover itself, 
with an "openlibraryiid" machine tag. The answer was: Of course, why not! It is 
early days and we can still make our own consensus so let's see where it goes. Or 
maybe tag it as an "action shot" instead. 



horse=yes 



It's not as crazy as it sounds. The Open Street Map project whose mission it 
is to map the entire world uses just this approach and they've been surprisingly 
successful. In just five or six years they've managed to produce a dataset in the UK 
nearly as good as the Ordinance Survey which has had ... By using tags like this. 
No, really. 



time:hour= 



We've used tags as part of the Flickr Clock, a visualization of videos 
uploaded throughout the day, that was created for us by Stamen Design. Most 
videos don't have very much useful metadata, including the day or the time they 
were created. We were able to use machine tags to give people a way to add 
structure date/time information to their videos which was then interpreted by the 
Flickr Clock application. 



In the process we were able to teach people how to add tags and how to use 
them and, hopefully, see their value. I'm pretty sure (or at least like to believe) that 
the moment people understand how something is useful for them is the also the 
moment they start to think about how to play with it and how to use it for something 
entirely new. 



discovery 



This is also called discovery. 




No one said it was easy. 



nubby bits 



Formal ontologies are useful when you know the boundaries of your domain. 
I've seen people in the public safety sector get very excited about them because it 
means they can keep track of where are their ambulances are. This is a good thing. I 
want them to know where the ambulances are. But it's a pretty brittle approach 
when applied to something as wide-open and open-ended as the Internet and even 
more so when it involves communities from all over the world coming together to 
share and discuss their photos. We try to be mindful of building what a colleague 
described as "small tools for self -organization". Tags are one such tool because 
there's just enough convention (language) for people to have a common ground to 
operate on but still have a rough enough surface to hang new and wacky ideas off 
of. 



thank you 



I'm not sure how I would do the same talk differently. I certainly didn't read 
from the notes which have since been edited to reflect what I meant to say and what 
I wish I'd said. I wouldn't use yellow text again unless I was confident that the room 
I was speaking in was pitch dark. 
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That was me, talking (part one) 



I am back from State Of the Map http://www.stateofthemap.org/ .It 

was fantastic. 

I spoke on Friday morning about how the Flickr uses Open Street 
Map http://www.openstreetmap.org/ and people seemed to enjoy it. One part 
of the talk that was interesting for me was threading the needle between 
acknowledging the obvious existence of an unhappy version of the story while not 
dwelling on it or saying things that, uh... did I say that? 
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Would that I have written presenter's notes, right... 
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Yahoo! Maps, Driving Directions, 

and Traffic -34.6055, -58.4719 - Google Maps Open Street Maps on Flickr 




-/fetch tiles.py -p africa.txt -o ~/osm 



(my harddrive) 



Flickr image storage 



map openstreetmap tile broker.gne 



i....=,: p i varls osm 



if not os.path.exists(cache_dir) : 
try: 

os.makedirs(cache_dir) 
except Exception, e: 

print "failed to makedirs , skipping because it's probably a thread thing.., 

continue 



conn s httplib.HTTPConnection(netloc) 
conn.request('GET', path + ('?' + query).rstrip('?')) 
response = conn.getresponseO 

If str(response.status).slartswith('2') : 

tile_img = PIL.Image.open(StringlO.StringlO(response.readO)).convert{'RGBA') 

tile_img.save(cache_file) 

# imgs.append(tile_img) 



http://farm2.hv-static.flickr.coni/1010/temp/osm/ 

12/3374/1 551 .png 



http://www.fli ckr.com/map_openstreetma p_tile_broker.gne?t=m& 

x=3374&y=496&z=6 



"The tricky part was with the Yahoo Maps ... In that you can only 
check to see if you're *ln* a valid OSM area, *after* you've zoomed/ 
moved the map, not 'before* ('cause you don't know where the map 
Is *about* to move to) ... but 'after* you've zoomed/moved the map Is 
already loading in new tiles ... so you need to tell it 'STOP, wait, don't 
load those, load these OSM ones instead' ... and In the case of Yahoo 
Maps that meant using a undocumented function that blows away 
the tile cache and forces a tile reload." 
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Story tiles 
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The only thing I've changed slightly in the slides to reflect a world where I 
would do it differently next time is the inclusion of Stewart's "vision statement" for 
Flickr: to be the Eyes of the 

World http://blog.flickr.net/en/2006/03/24/eyes-of-the-world/ . I think 
that the Open Street Map community does the same thing and I think it's one of the 
reasons that we like it so much! 

At the very end, I tried to belabour two important thank yous: 

• For helping make the thing that we (at Flickr) care about and struggle 
to work on every day better. 

• For proving the nay-sayers wrong. 

The goal of the Open Street Map project, when it began, was of such hair- 
brained proportions as to be laughable. Which is rarely a reason not to do something 
and (however many years it's been) on they've not only pretty much done it they 
have also, despite all the inward facing questions about participation and conflict 
resolution and general sausage making, they have done it in a way that probably 
ensures it will keep going and only get better. 

There is no going back to a world before Open Street Map and that is, in 
many ways, just as impressive as the actual "thing" they've created. One reason the 
Internets have always excited me is that they afford the possibility of building the 
world we want to live in so I am doubly happy when someone does. 

There's no time to do a proper recap of the rest of the conference, right now, 

SO I will end by parroting Mike http://mike.teczno.com/notes/slides/open- 

paper-maps . html when he says: Amsterdam is 

magic http://www.flickr.com/photos/straup/sets/72157621283220147/ ! 
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This is me, talking 



As it happens, I will be speaking quite a lot this month. I will be making the 
words about: 

• The Flickr-OpenStreetMap integration at State of the 

Map http : / /www . stateof themap . org/ , in Amsterdam later this 
week. 

• Flickr and "Social Tagging of Multimedia Collections" at the 
Visual Web 

Meetup http://www.meetup.com/visualweb/calendar/10764407/ , 

in San Francisco on July 16. 

• Maps and dragons and comics (aka the "Undiscovered Country") 

at GeOWeb 2009 http : //geowebconf erence . org/ , in Vancouver 

at the end of July. 

With any luck "The Thing, With The Stuff" will be ready to talk about 
publicly by the time Geoweb rolls around but since July has already shaped up to be 
sixteen different ways to crazy that might be ambitious. 




In the interim, the Talk Is Cheap Department offers instead a small thing I 
made with the I Am Here Map http://www.aaronland.info/iamhere/ to test 
the new and shiny 

flickr.placeS.getTopPlacesList http://www.flickr.com/services/api/flickr.pl 
API method. Here are: 

• The top 100 countries with geotagged 

phOtOS http://www.aaronland.info/topplaces 

• The top 100 regions (states) with geotagged 

phOtOS http: //www.aaronland. inf o/topplaces/?type=8 

• The top 100 regions, in the United States, with geotagged 

phOtOS http : / /www . aaronland . inf o/topplaces / ? 
type=8&woeid=23424977 . 

Also, 

extra :extra=extra http://code.flickr.com/blog/2009/07/06/extraextraextra/ 
I 
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I Am Here Map (with apologies to 
Simon) 



I finally got fed up of hunting around for simple latitude/longitude tools 
when messing around with mapping APIs, so I built my own with a 
memorable URL. 

Simon 
Willison http://simonwillison.net/2007/Oct/12/latlon/ 



Simon http://simonwiiiison.net/ will be remembered for many things 
andIhopeoneofthemisgetlatlon.com http://www.getiation.com/ . It's a 
remarkably simple site — you drag a map around and it displays the latitude and 
longitude of the center point — but it insanely useful for all kinds of things and no 
one, before Simon, had managed to put two and two together. It was definitely a 
Duh! moment for me. 

I use it all the time, now, and I needed something like it for a side project I'm 
working on. My first thought was to clone Simon's work but when 
Tom http : //www. tom-carden . co . uk/ , from Stamen, told me that they had been 
working on an (still) experimental Javascript branch of the ModestMaps 

Code http://modestmaps.mapstraction.com/trac/browser/trunk/js I 

decided to try that instead. 

Most days I still haven't gotten over the initial shock and awe of seeing 
Google Maps for the first time but at a certain point All-Things-Google-All-The- 
Time starts to feel like walking on thin ice. They're doing just fine competing on 
features (a good thing!) but I think it's important we also continue build and support 
an infrastructure Of tools http://highearthorbit.com/mapstraction- 
updates / that people can run and host themselves. 

With that in mind, I set out to write my own "getlatlon" map tool last week. 
It's called "I Am Here Map" and on Sunday I finally pushed it out the door. 

• It does basic getlatlon- style lookups 



• It does geocoding using either the Google Maps or Flickr APIs. It's 
part of the plan to support a variety of other geocoding services soon. 

• It does reverse geocoding, using the Flickr API. 

• It will fetch and display the 

shapefile http://code.flickr.com/blog/2 008/10/30/the-shape- 
of -alpha/ for associated with a point, assuming it's been reverse 
geocoded successfully. 

• It does client-side geolocationlpositioning (oka "find my 

location http://code.flickr.com/blog/2009/04/16/changelog- 

f ind-my-iocation-button/ ") using a variety of third-party services. 

• It allows for an arbitrary number of map tile providers and styles. At 
the moment, there is only one tile provider: 

CloudMade http : / /www . c loudmade .com/ . 

• The whole thing, including core dependencies, is bundled in a single 
Javascript file and loaded with a couple lines of code. 

Like this: 



Ibaghdad [raq 



* TIND THIS PLACE or find my location 




33.31570000000001,44.392199999999995 

Baghcad, Baghdad, Iraq (WOE ID [J 979455b 



(Map data CCBYSA http://creativecommons.org/licenses/by- 

sa/3 .0/ 2009 OpenStreetMap.org http://openstreetmap.org/ contributors, 
because thafs an early screenshot before I added proper attribution.) 

This is what the code to generate that map looks like: 



// as in < script src="iamheremap. js"></script> 
// and <div id="map"x/div> 

var args = { 

'modestmaps_provider ' : 'CloudMade' , 

' f lickr_apikey ' :YER_FLICKR_APIKEY http://www.flickr.com/services/api/keys 

' cloudmade_apikey ' :YER_CLOUDMADE_APIKEY http://developer.cloudmade.com/ 

'map_Style ' 9699 http://www.sensescape.com/2009/02/cloudmade/ , 

'map_height ' : 480, 

'map_width ' : 64 , 

}; 

window. map = new info.aaronland. iamhere.Mapf 'map ' , args); 



You can see a live demo over here, using a different "style", specifically 
Matt Jones' Image of the City 



tiles http: //magicalnihilism. wordpress.com/2 09 /04/06/my-f irst- 
cloudmade-map-style-lynchianmid/ : 

http ://w ww .aaronland .inf o/iamher e#s ty le=224 1 

The "style=" stuff isn't part of the default install but it's a simple example of 
the sort of thing you can do. At the moment there are no standard controls to toggle 
between n number of map views/tiles but that's obviously a good next step. It's also 
one of the things that ModestMaps makes really 

easy! http : / /modestmaps . com/tutorial-actransit/ 

I am pleased and excited by all of this. One of the things I talked about at 
PaperCamp http://adactio.com/journai/i546/ was the desire to get back to 
building some of the online tools for Papernet projects. That includes a generic list- 
map- store style interface for things like geotagging Twitter 

posts http: //www. aaronland. inf o/weblog/2009/ 03/ 14 /buckets /#intimacies , 
a new and Moar Bettar version of 

deliciOUSmapS http: //www. aaronland. inf o/weblog/2007/08/24/aware/#delmaps 

and the Other Thing (with the Stuff). 

The "I Am Here Map" is not that toolkit but it's a building block, at least. 
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list map store format 



As usual, the code and a list of known-knowns (this has only been tested in 
Firefox and Safari, for example) is available over on Github: 

http://github.com/straup/js-iamheremap/ 



Enjoy! 
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ice cream is my subfloor 



Flickr (App) For Busy People 



Flickr (App) For Busy People 



The Thing (with the Stuff) is taking longer than planned because life has 
gotten in the way and I haven't been able to work on it since before leaving for 
Amsterdam http : / /www . aaronland . inf o/weblog/2 09/06/0 1 /bubblegum/#talk 

The Thing (with the Stuff) has been squirting out over time, though, in small 
chunks like the I Am Here 

Map http: //www. aaronland. inf o/weblog/2 09/0 6 /01/bubblegum/#iamhere 

and more recently the FlickrApp https : //github.com/straup/gae- 

f lickrapp/tree package, a library for use with Google's 

AppEngine http://code.google.com/appengine/ platform that lets you treat 

the Flickr Auth 

API http: //www. flickr.com/services/api/auth.howto.web.html as a single 

sign-on and validation service. FlickrApp was in fact the spark for the Thing (with 
the Stuff), when I realized that one of the side-effects of treating the API that way is 
that you get a user-scoped Auth token for free. This is no different really than 
OpenID or any other SSO service, if you're wondering. It's just you get to do stuff 
with the Flickr API http://www.flickr.com/services/api at the end of it. 

It's true that Google creeps me out, a little, these days but it would be unfair 
not to point out that AppEngine is a pretty awesome piece of work. It is not a magic 
pony http:/ /www. youtube. com/watch? v=i6Fr 6 5PFqfk but it does make writing 
small, fast and bespoke 

tools http://www.paulhammond.org/2008/12/minimuni/ incredibly easy. As I 

write this I am still counting on the promise of 

AppDrop http://github.com/jchris/appdrop/tree/master to allow me to 
keep running stuff outside of Google's warm embrace if or when, AppEngine goes 
sour. This probably needs to be proven sooner rather than later but assuming that it 
works I am happy enough to keep using AppEngine even at the cost of avoiding 
some of the fancier features that may never be supported elsewhere. 

A few weeks ago, mroth http://www.mroth.info/ showed me Twitter 
For Busy People http : / /www . twitter forbusypeople . com/ . The site itself 
doesn't interest me that much because that's not the way I want to use Twitter but 
the way in which the data is aggregated and presented does interest me. Some of 
this goes back to Dan Hill's Big Floating 



Informatics http://www.aaronland.info/weblog/2 007/12/2 0/castles/#zomg 
but the short version is that a quick high-level roll-up of the activity of the people 
you're interested in is really useful. 

In my case, photos my contacts have uploaded to Flickr (especially while I'm 
sleeping) . 

So I wrote Flickr For Busy 

People http://flickrforbusypeople.appspot.com/ . 

It does very little, by design. It uses the Flickr API to fetch the list of 
contacts who have uploaded 

photOS http: //www. flickr.com/services/api/flickr.contacts.getListRecentl 

in the last 30 minutes, two, four and eight hours and displays their buddyicons along 
with the number of photos they've uploaded. When you click on a buddyicon the 
site will load thumbnails of those photos. And that's it. 



in the last 

30 minutes 


Nothing new... 




2 hours 


.1 fS? n 

10 photos 1 photo 1 photo 




4 hours 


tfffl 
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1 photo 1 photo 1 pnoto 1 photo 3 photos 


1 photo 




From ksllan: 
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Thcfta... 





I have resisted (am still fighting) the urge to display larger photos inline with 
lightboxes and things like a "fave this" button. I want to try and keep the whole 
thing light and simple, a prism to reflect activity on Flickr rather than a replacement 
or alternate version of it. This is why we have hyperlinks, after all. 

(It does automatically reload the page every 15 minutes because I am lazy 
that way.) 

The finishing polish on Flickr For Busy People took longer than the initial 
prototype which I built in an afternoon. No surprises there. That's just how it works 
(thanks go to George http : / / abitof george . com/ and 

Cal http : / /iamcal . com/ for their ever-reliable attention to niggly details) but it 
was really nice to get something up and running "in the time it takes to have a 
meeting http://twitter.com/keiian/status/1447590400 ". Kellan did pretty 
much the same thing building Photos That 

Matter http : / / laughingmeme .org/2009/07/22 /photosthatmatter- 

f lickrapp/ , on top of FlickrApp, in pretty much the same amount of time. 

/ did it again today, during a meeting, wrapping the topia term 

extraction http://pypi.python.org/pypi/topia.termextract/ library that 

everyone's been talking about in a brain-dead stupid AppEngine web 

interface http: //github.com/straup/gae-termextractor/tree/master . 

This makes me happy. 

Flickr For Busy People http://flickrforbusypeople.appspot.com/ 

should work in any web browser (including the iPhone) and is free for anyone to 
use, at least until it starts to become a financial burden, but if you'd rather run your 
own copy the source code for both FlickrApp and Flickr For Busy People are 
available on the GitHub: 

• gae-FlickrApp http://github.com/straup/gae-flickrapp/tree 

• gae-flickrforbusypeople https://github.com/straup/gae- 



f lickr for busy people /tree 

The FlickrApp packages contains a Hello World style 

example http://github.com/straup/gae- 

flickrapp/tree/d5061ebb87fl7cbfd9892 4d6e617ffff05dl3ac9 /example which 
is about the easiest way to get started using the package. The documentation is still 
a bit sketchy while the dust settles but if you want a complete example of how to 
use it then the code that runs Flickr For Busy People is a good place to look. 

In the meantime, I've started to work on a similar SSO-like package for 
OAuth http://oauth.net/ based APIs. I'm totally thrilled with OAuth the idea 
but, so far, eveything else about the spec makes me cry so it's still sort of mostly an 
exercise in just trying to build something with it. Since more and more people are 
using it, though, it would be a handy tool to have with which to treat the web like a 
big and shiny shell script. The code https://github.com/straup/gae- 
OAuthApp/tree , such as it is, totally doesn't work yet. I'll get around to finishing it 
eventually but if someone wants to do it first that would also be great. 



We now return you to the margins of the day. 
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Five Things I Did This Week 



Fake Subway APIs 

Burning Man 2009 Tiles 

The Mirror Projectization of Fli... I Mean,... 

Suggestify 
I Was That Guy 



Fake Subway APIs 



http://fakeSllbwayapiS.appspot.COm/ http://fakesubwayapis.appspot.co: 

This is just a little thing, a small piece of plumbing, to facilitate next month's 
"Thing (with the Stuff)" and it's pretty much what it sounds like. It doesn't even do 
much right except return the name of a subway, or train, station for a short code. 

Like this: 

# GET http: lit akesubwayapis.appspot.COm/bart/getinfo/24th http://fakesubwayapis.appspot.com/bart/getinfo/241 

<rsp stat="ok"> 

<station code="24th" service="bart"> 

<urMJstp: //www. bart.gov/stations /24th/ http://www.bart.gov/stations/24th/ </url> 
<name>24th St. Mission (SF)</name> 
</station> 
</rsp> 

So far there are only four supported lines: the BART http : / /bart . gov/ 
in San Francisco, the STM http://www.stm.info/ in Montreal, the 
Underground http://www.tfi.gov.uk/ in London and the National Rail 
Service http://www.nationairaii.co.uk/ in the whole of the UK. The reasons 
why are two-fold: They have unique short code for each station and they have 
corresponding web pages for each one of those identifiers. 

At the two extremes of "getting it" are 
BART http://www.bart.gov/stations/mont/ with lovely, detailed 
Places http://www.fiickr.com/piaces/ -style pages for each of the stop in their 
network and Transport For 

London http: //www.tf l.gov.uk/tf 1/livetravelnews/departureboards/tube/de 

LineCode=piccadiiiy&stationCode=Pic which requires a handful of work- 
arounds, and shims, to account for the fact that there are only arrival and departures 
pages for station-plus-line combinations. 



S.PnuTi 










JSswe 



What they all share though is a lack of an API so, over the weekend, I 
banged out an AppEngine thingy http : / / f akesubwayapis . appspot . com/ that 
simply wraps a bunch of static lists in plain-vanilla XML-over-HTTP glue. I have 
two hopes for Fake Subway APIs: 1) That they be replaced by real-live APIs 
operated by their respective transit services 2) That, in the meantime, people submit 
patches and data (geo information, for example) and help to shape the kinds of 
properly maintained APIs that we can hope to use in the future. 

As usual, the source code is available for poking on the GitHub: 
http://github.com/straup/gae-fakesubwayapis http://github.com/straup/gae- 

f akesubwayapis 
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Burning Man 2009 Tiles 




A year ago, when Dan http://www.geobioggers.com/ and I were first 
thinking about adding custom Flickr map tiles for Burning 

Man http://code.flickr.com/blog/2 08/08/25/flickr-heart-burning-man- 
heart-openstreetmap/ we wondered what we would do in a year's time. How 
would we deal with the mechanics of historical tiles, especially ones for a "location" 
that physically moves by a not insignificant amount every year? 

As usual, we punted. After all we had a year to figure it out. 



And then a year rolled 

around http://www.flickr.com/photos/straup/sets/72157612087736978/ 

and Mikel http : //brainof f . com/weblog/ and I were sitting on a train in 
Holland, unaware that we were travelling in the wrong direction, happily talking 
about the Burning Man (Earth) 2009 

APIs http://earth.burningman.com/api/docs/ and how we could wire them 

up and do machine tags extras magic on 

Flickr http://blog.flickr.net/en/2009/08/28/burning-man-theme-camp- 



machine-tags/ . Which has nothing to do with map tiles, really. It's just a nice 
story. 

It proved to be a bit more cumbersome to grab the tiles this year, as there 
were distractions and other hiccups on both sides, so we lost a bit of the excitement 
by not having them live the day Burning Man started (not that anyway really posts 
their photos from the Playa). 

What we did do was work out (read: even more bubblegum and duct 

tape http://www.aaronland.info/weblog/2009/06/01/bubblegum/#sotm09 ) 
how to toggle the map tiles displayed based on whether a photo was taken in 2008 
(green tiles! http://www.flickr.com/photos/sgoralnick/map/? 

photo=282i76i695&zi=5 ) or 2009 (yellow 

tiles! http://www.flickr.com/photos/nosamk/map/?photo=3930694996&zl=4 ) 

I don't know if it's the first "time-aware map tile 

Set http://twitter.com/mikel/status/3986163485 " like Mikel said but I'm 
glad we're doing it. There's still lots of work to do with all of this stuff, but at least 
it's a start. 
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The Mirror Projectization of FN... I 
Mean, Galleries! 



Scream 

by Yogi 



* 



aaarghhhh...!!! I an RDSso.hearme.. 
by @ marina @ 




We launched Galleries on Monday, http://www.fiickr.com/gaiieries 
Five days later there were 25, 000 (and counting) 

galleries http://blog.flickr.net/en/2 009/09/18/gallery-o-rama/ . 



For as long as Galleries took to get out the door it was also a deliciously fun 
project to work on. Whether it's the scale of our dreams or just because everything is 
so much bigger now, projects can sometimes feel overwhelming on good days and 
like moving mountains on bad ones. It's worth it (on the good days, anyway) but 
Galleries, done with a fierce eye towards "simple" , was a nice echo of of earlier 
times. 



There are plenty of things left to do with Galleries but this feels like a good 
place to start (or stop, for a while) and to step back and watch what kind of magic 
users make of it. So far it's been delightful. Galleries were designed to allow, and 
encourage, users to curate other people's stuff but to do so with a constraint (a 
maximum of 1 8 photos per gallery) that would serve to highlight the relationship 
between each one of the photos. After that, it's up to each user to decide what their 
galleries are about whether it's 
Squirrels http://www.flickr.com/photos/thejacksons/galleries/7215 76219802 

or outer 

Space http: //www. flickr.com/photos/royalobservatory/galleries/7215762238 

or, if you're Mike Montiero, bloody 

nipples http: //www. flickr.com/photos/dorkmaster/galleries/ (yes, bloody 

nipples). 

Reading through the comments on galleries and watching people 

Twitter http: //search, twitter. com/search?q=f lickr+galleries about how 
excited they are to be included in other people's galleries is the icing on the cake. 

These are a few of my favourites, spotted in the first days since lauch: 

• All the mini- 
horses... http: //www. flickr.com/photos/carieellen/galleries/721576 

• A perfect 

moment http://www.flickr.com/photos/dunstan/galleries/72157 62 2 

• Welcome 

to... http://www.flickr.com/photos/meg/galleries/72157 622390833078 

• Kid+Cat 

Scream http: //www. fiickr.com/photos/kid_curry/gaiieries/72 157622 

Finally, this one was extra special for me and Heather having built and 
tended to the Mirror 

Project http://web.archive.org/web/2 00610171924 05/www.mirrorproject.com/' 



(which I've been known to describe as "Flickr, with no ambition") so many years 
ago. I hope you'll forgive me this one small indulgence: 




Hats off to 
Adrienne http://www.flickr.com/photos/heydrienne/galleries , 
Heather http://www.hchamp.com/ , Jude http://matsalla.ca/ and 
Shanan http: / /shanand. blogspot.com/2009/ 09 /unlocking-curation-on- 

f lickr . html . It was grand! 
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Suggestify 



Suggetify is this month's "Thing (with the Stuff)". 

The Thing being the ability to suggest geotags for other people's photos on 
Flickr (which I guess makes the Stuff the part where it runs as an API application 
on top of AppEngine which, in turn, was the genesis for the gae-FlickApp 
library http://github.com/straup/gae-flickrapp ). It lives here: 

http://suggestify.appspot.com 

It is still clunkly around the edges and lacks polish and proper UTlove but it 
does work and after a couple months of tiresome hand- wringing about not finding 
the time to make the shiney-shiney I've just decided to push it out the door. 

Here's the short version: 




incf: in:- v^lpo'e'v: j~ic": •.-::. Id ;:■ -n;. :~i;ic 



slash kittens suggests this photo is located at latitude 
tall 37.7944591557 and longitude -122.401996769 at 

zoom level 1fl. (They also think it was taken indoors.) 

In human-speak, that's Financial District, San Francisco, 
CA, US, United States. 



Who can sec the locallon 


Information For this photc 


? 


anyone 


coniacts 


ffiorwJs 


Q family 


8 


ffi&rKl and family 


jusl m& 












Where was thEs photo was la ken? 








© indoors 


outdoors 


I'd ratiior :nol say 
















close 



The long version is over here: 



http://suggestify.appspot.com/example 



The in-between version goes something like this: 



• Users sign up (and in) using their existing Flickr accounts. 

• Suggesions are added by selecting another user's photos from a carousel and 
then positioning cross-hairs on a 

map http: //www. aaronland. inf o/weblog/2 09/0 6/01 /bubblegum/#iamhere 

At the moment there is no magic drag-and-drop UI love. 

• Suggestions are stored on the site, pending review of the actual photographer. 
This is one of the stickier problems to work out still: How to notify another 
user that someone has added a suggestion to one of their photos without being 
spamtasitic about it? For now, I've opted to be more conservative than not 
which means that, in many cases, user A will need to tell user B that they've 
added a suggestion. User B, if they choose, can set up an email address where 
they'll be notified of new suggestions but otherwise this is very much a loose 
end to be sorted out. Suggestions, no pun intended, are welcome. Photo owners 
are notified of pending suggestions by (Flickr) comments left on the behalf of 
the suggestor or by email notifications. The details are described below but the 
short version is that both may be opted-out of. 

• A user whose photo has been suggestified may choose to approve or reject the 
suggestion or even "block" the user from adding any more suggestions to their 
photos. A user may also choose to prevent anyone from suggesting locations 
for their photos. They'll still need to sign up for the site, so that we can 
validate who they are, but after that they never need to come back if they really 
want nothing to do with the project. 

• If a suggestion is approved and the geo information is public then a 
geo : suggestedby= machine 

tag http://code.flickr.com/blog/2009/07/06/extraextraextra/ is 

added to the photo. This provides a useful marker to indicate which photos 
were geotagged with suggested locations and a little bit of recognition and 
thanks for the person doing the suggestion. Here are all the photos tagged 

geo:SUggestedby— http :/ /www. flickr. com/photos /tags /geo: suggestedby=* 

so far. 

• There are conversations afoot to provide facilities to make it easier for the site 
to work with (trusted) robots. It's a pretty simple model so the only thing 
should be for me and the robot farmers to write some code. 

As best I can tell there aren't any glaringly obvious bugs, although so far this 
is has only really been tested in Firefox. There is a list of currently known knowns 
that will be updated as thing are fixed (or discovered): 



http://suggestify.appspot.eom/about#known 

Suggcstify's Achilles' heel is that it's still very difficult for people to find out 
that someone has suggested a location for their photos. Basically the suggestor 



needs to tell the photo owner and/or the photo owner needs to already be signed up 
to Suggcstify and have set up email notifications for new suggestions. Another 
obvious way to let people know would be to post a comment on the photo 

page http! / /www . flic1cr.com/GGrvicGa/api/flic3cr.photoa .comments . addCommcr 

on behalf of the the suggestor. Talking to people this seems to be the obvious 
approach but to start I've opted to be more conservative than not about this sort of 
thing, mostly because I don't want Suggcstify to seem like a spamtastic nag. 

Update: I have started working changes to allow for notifications through 
comments. The easy part is adding the comments, that took all of five minutes. The 
harder part, and the reason it may take another couple of days to deploy, is making 
sure that photo owners can opt - out of the feature and doing the right thing with 
regards to Flickr Auth tokens: You need a token with "write" permissions to leave 
comments using the Flickr API but that shouldn't necessarily be required for 
someone who just wants to make a suggestion (and only needs a "read" token to 
prove who they say they are on Flickr) and doesn't care about adding comments. 

Update (my update): After some convincing I have enabled comment 
notifications, by default, with the ability for photo owners to opt out. Here's what 

the site Says about it http://suggestify.appspot.com/about : 



When someone suggests a location for another person's photo 
Suggestify tries to post a (Flickr) comment with a handy link back to the 
suggestion on the suggestor's behalf. Like this: 



slash kittens £J£ says: 

I've suggested a location for this photo over at the Suooestify project. 

I think it was taken somewhere around: Financial District. San Francisco, CA, 
US. United States . You can see the exact location and approve or reject this 
suggestion by following this link: 

h up :flsuoaestifv.apos pot .com/revi ewOa4a9250 06 

If you do approve the suggestion then your photo will be automagically 
geolagged! 

(You can also configure Suggestify to stop these notifications from being adced 
lo your photos or to prevent any of your photos from being "suggestifiec" at all in 
the future.) 
Posted 4 seconds ago. ( permalink I delete ) 



Photo owners may choose to opt-out of comment notifications 

entirely /settings/notifications even if they continue to allow 
people to suggest locations for their photos. They may do this because 
they've enabled another notification mechanism and/or because they'd 
rather not have "broadcast" style comments added to their photos. It may 
make it more difficult for a photo owner to find out about a suggestion 
but that is a photographer's prerogative. 

(It's also possible that the suggestor doesn't have permissions, on the 
Flickr site itself, to add comments.) 

The ability to leave a comment requires that the suggestor grant 
Suggestify "write " access to their Flickr account. That's because, in 
Flickr API terms, "write" means the ability to modify a photo — adding 
a comment for example — as that user. If a user (making suggestions) 
prefers that Suggestify only have a "read" token for their account then 
they can still suggest locations comments won't be added and the 
recipient will need to be notified by other means. 

Users may also configure Suggestify to send email notifications when 
new locations are suggested for their photos. 

Both comment and notifications may be configured the 
settings /settings tab on the site. 



I've also posted the source code for the site over on the GitHub: 

http://github.COm/straup/gae-SUggestify http://github.com/straup/gae- 
suggestify 

I'm doing this because I would love for people to help make this better 
whether it's by submitting patches specific to 

Suggestify http://suggestify.appspot.com/ or by setting up their own 
(private, semi-public, whatever) instance running on their own servers (remember 
though this is still a Google AppEngine thingy which hasn't been tested on 

AppDrop http : / /waxy .org/2008 / 04 /exc lus ive_google_app_engine_ported_to_ 

Talking to people it seems clear that there's a real desire for the ability to 
suggest locations for other people's photos (Flickr 

Commons http://www.fiickr.com/commons/ , anyone?) This is what I can offer 
today, warts and all, but hopefully people will find it useful enough to bother 
investigating and building on. 

If nothing else, it allowed Neb to tell me where this video in 

Amsterdam http://www.fiickr.com/photos/straup/3710672229/in/set- 

72157621283220147/ was shot! 



Meanwhile, obvious questions will remain unanswered. 
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I Was That Guy 



I was that guy. Who cancels at the last minute. 

I was planning to attend Conflux http: //conf luxfestival .org/ this 
year and was scheduled to do ... something for the big DIY Conflux 

City http://confluxfestival.org/2009/events/conflux-city/ event on 

Sunday. I submitted a proposal to do a talk about 

Clustr http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and 

Flickr-derived shapes http://code.fiickr.com/biog/2008/10/30/the-shape- 
of-aipha/ we've been working on asking the question What else could the tools 
and software we 've developed be used for? 



The software that generates those representations was designed to be 
general enough to use with any dataset consisting of a set of latitudes 
and longitude. Given the chance, what are the dinner-time, war-time and 
drunken kitchen-party tories that all the other places we have known 
would tell? What are the shapes of history? 



Conflux seemed like the ideal place to ask those kinds of questions and I was 
looking forward to the challenge of encouraging an entirely new audience to think 
of Clustr that might be useful to the work they are doing. I was even looking 
forward to the challenge of figuring out where, and how, to do a presentation 
outside the traditional "four walls and a projector". I joked about doing a walking- 
tour, stand-up Bob- Dylan style presentation with big cardboard slides on the High 

Line http://www.flickr.com/photos/straup/3536556698/ . 

Circumstances dictated, though, that by the start of the week I had barely 
even looked at the festival schedule let alone moved beyond mental sketches of 
what I was going to say, or where. Launching features and fixing the inevitable bugs 
that pop up afterwards all while moving house and managing the daily chaos of life 
at the same time will do that to you. 

So, on Wednesday I finally had to admit defeat and write the organizers to 
tell them I wouldn't be attending. 



If you're anywhere near New York City this weekend you should go in my 
absence. Conflux http://twitter.com/confiuxfestivai/ is not without a 
healthy dose of goofy but every time I've gone I've enjoyed it, on measure, and 
there's always been a seed planted during the festival that seems to germinate a few 
months later. 




I suck. 
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buckets of vessels 

The City Is Here For You To Use 

Embiggen-izing for Myl ... I mean, Busy People 

(This is) Flickr Chatterbox 



The City Is Here For You To Use 



(With apologies to 
Adam http://speedbird.wordpress.com/2009/10/i8/the-city-is-here-for- 

you-to-use- very-provisional -bibliography/ .) 



About The Space Claw / Sutro Tower 

Pholostrearn 16 Buy The Space Claw a Pro Account The Space... is a contact 



1 s.rr ye Scace lane. 



Photos of The Space Claw («n 




The Space Claw's favorite photos from other Flickr members 

(6) 




The other day Dan 
asked https://twitter.com/revdancatt/status/5272789i99 : "Who amongst 
us will write the Building as Contacts and Related Goodness blog post?" It's worth 
remembering, I think, that he already 

has http : / / geoblogger s . com/ 2009/04/17/ 2 -every-building-with-a-shoebox- 
in- its -basement/ 



About The Pointy Building / Transamerica 
Pyramid 

Pholoslream Id Buy The Pointy Building a Pro Account The Pointy... is a contact 



"I like to think of this as digital footprints, trails left behind by the many previous travelers 
through the city. That somehow the Duiiding is collecting 1000s of tiny snapshots of people's 
memories. They took that photo of that building, because they wanted to remember being 
there. For someone that angle, position and time was important. For the building it's a way of 
recording it's own history through the eyes of everybody." 

- Rev. Dan Call 



Photos of The Pointy Building ;s5i 




WJm 



§§PI 



The Pointy Building's favorite photos from other Flickr 
members (ij 
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Embiggen-izing for Myl ... I mean. 
Busy People 



2 hours 




1 7 protcs SpnoiDs 1 pnoto 1 prwtc 26 pnocos 1 prato 1 pnoio 
From Matt Biddulph : 




I made a small change. It's very simple. 
I've updated Flickr For Busy 

People http://flickrforbusypeople.appspot.com/ to allow users to view 

large photos from their contacts. Per the docs: " You may regret embiggen - izing 
photos when one of your contacts posts 300 wedding pictures in one go but that's 



your business. If a contact has uploaded more than 20 photos in a given time slice 
(30 minutes, 4 hours, etc.) then embiggen-ing will be automatically disabled until 
there aren't quite so many photos to show at once." 

The feature is off by default but can be enabled from the settings 

page http://flickrforbusypeople.appspot.com/settings . 
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(This is) Flickr Chatterbox 



111 



9 




: Nils has 1 new comment for 1 photo 
Mastronardi has 1 new comment for 1 photo 
nasa hq photo has 4 new comments for 3 photos 




;l wow, who knew socks could be so 
popular ;) thanks for the comments!" 
— squirrel monkey 



I made a new thing. It's very simple. 
It asks the Flickr 

API http: //www. flickr.com/services/api/flickr.photos.coraments.getRecent: 

for photos belonging to your contacts that have been commented on in the the last 
30 minutes. 

I find it useful for (re) discovering photos from my contacts that have 
otherwise fallen through the cracks. 

It's called Flickr Chatterbox http://flickr-chatterbox.appspot.com . 
It could probably stand to have keyboard shortcuts... 
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[that is all] 



The Flickr Years 



The Flickr Years 



3. 



The Year of 

Broken 

Ceramics 

51 5 photos 




The Year That 

Meat Ate 

647 phots 





The Year of 
Cheap Shades 

457 photos. 12 videos 




The Year of the 
Long Year 

365 photos. 13 videos 



Five years ago I moved to Vancouver, on a week's notice, to work at 
Flickr http://www.flickr.com/photos/caterina/2274839/ . 

At the time I 

wrote http://aaronland.info/weblog/2004/12/31/5577/ : "I am excited about 

both the work and the chance to live in Vancouver but in an equal action/reaction 
kind of way the thought of leaving Montreal crushes me every time I think about it. I 
don't know where the balance is between nuturing the roots you lay and being able 
to let go and explore. ...I have reservations but it seems like too good an 
opportunity and one that I would always wonder about ifVd said no." 

Five years ago I asked Stewart "Why me?" and he replied "Because you're 
one of us". I will always be grateful to him for that and for the chance to help prove 
that, yes, it really is possible. 

And now it's time to let go, again, and explore. 



It is difficult and sad to leave Flickr but I have no regrets. If you asked me 
whether I'd do it again and what I'd do differently I'd tell you that I'd do it again in a 
heartbeat and the only thing I'd change would be to try to do it harder and louder 
and faster than we already did. 




The good news is that I've accepted a position to frolic around and play with 
the trouble-makers that are Stamen Design http : / /www . stamen . com/ because 
"it seems like too good an opportunity and one that I would always wonder about if 
I'd said no". 

It's not often you get to say something like that twice in a row and in the 
immortal words of Gibby Haynes http://www.youtube.com/watch? 

v=JYGoougMHSQ&feature=PlayList&p=30D6BE9BD857C56D&playnext=l&playnext_frc 

"It's better to regret something you have done than to regret something you haven't 
done." 

That's what I told myself five years ago, anyway. 
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Somewhere over Nevada on the way to New York, the airplane wireless 
places me at Boston's Logan airport. The future is here, Myles, it's just not sure 
where that is exactly. 

I mention that only because I made a thing, sitting in the airport in 

Tampa http://www.fiickr.com/photos/straup/42 444 96 606/ ,at the beginning 
of the week. I made a book. 

I run a cron job every day that calls the Twitter API and fetches my most 
recent updates and saves each one as a separate XML file to a local computer in 
nested YYYY/MM/DD folders. That's all it does. It's been running every day since 
April 2008 and, until now, I've not done anything with all those files. One reason 
that I run the cron job in the first place is that not backing up your stuff on third- 
party services is just silly. That's not meant as a judgment or a finger-pointy 

thing http://search.cpan.org/dist/Net-Flickr-Backup , against any 

particular company or cloud- 
castle http://delicious.com/straup/cloudcastles . It's just that many copies 
keeps stuff safe. 




So I figured I would make a book of all my Twitter messages, from 

2009 http://www.flickr.com/photos/straup/sets/72157612087736978/ . 

James did this last year and his book has a whale on 

it. http: // booktwo.org/notebook/vanity-press-plus-the-tweetbook/ At the 

time, he wrote: When Twitter is inevitably replaced by something else, I don't want 
to lose all those incidentals, the casual asides, the remarks and responses. That's all 
really. This seems like a nice way to do it... 

James opted for a Proper Book layout but since I never post often enough to 
create anything like a normal narrative I decided to stay sparse and use big letters 
floating in whitespace, which is probably what the messages seem like as they are 
posted. (The little seredipities created by having messages placed side-by-side is 
just icing.) Also, aside from a personal historical record the whole thing makes for a 
nicer, not to mention, funnier end of year holiday letter. I've never sent holiday 
letters but I have sent one the books to an old friend who doesn't spend much time 
on the Internet and I hope it will be like getting a nice, long 

postcard http://www.flickr.com/photos/straup/424 7 987089/ that affords the 

scent of the year even if she wasn't there to see the whole thing. 

Also, the thought that some day someone (probably me) might have to 

OCR http: //en. wikipedia.org/wiki/Optical_character_recognit ion the 

book back in to digital form makes me laugh. 

I spent a little bit of time adding an index of all the words used over the year 
but eventually decided against it. Aside from the grunt work involved in generating 
and laying out the index itself the addition of page numbers introduced a one visual 
distraction too many. 

Which didn't stop me from adding QR 
Codes http://en.wikipedia.org/wiki/QR_Code . Each QR code encodes the 
URL of the message itself which is, after all, a thing of the web and it seemed 
wrong not to honour that. I tried to make the codes themselves discrete (read: grey) 
but they can still be read by the clunky 5-year old "barcode reader" that still ships 
with every Nokia phone. I'm not sure that the bottom of the page is necessarily the 



best place to put QR codes but now that we are living in a world where people are 
building useful barcode readers http://code.googie.eom/p/zxing/ , I figure 

it's worth experimenting again http://www.aaronland.info/papernet . 

The nuts and bolts involve a Perl script that fetches the 

posts http: //github.com/straup/twitter- 

toois/biob/master/mk_twitter_backup.pi and a PHP script to turn them into 

a PDF file http://github.com/straup/twitter- 

toois/biob/master/mkannuai . php . The Perl-y bits do not handle fetching stuff 
in the past, at the moment. That would be pretty easy to do but it involves code to 
deal with pagination and keeping track that you haven't made more than 100 API 
calls in the last hour and pausing until the next hour if you have and so on. The PHP 
part simply takes two arguments: the path to the directory containing all the 
messages for a year and the location where your "book" should be saved. 

I've been using Lulu http://www.iuiu.com/ to print, so far, and if you 
can get past the awfulness of all their tools for creating book covers it's been pretty 
good. 

The code is part of the "twitter-tools" project, on 
GitHub. http://github.com/straup/twitter-tools It all still needs some 
work to finish the polish and packaging and maybe it warrants a complete re- write 
for someone with the ambition (or just a dislike of Perl) but, for now, it works for 
me. Enjoy. 
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Don't worry, Myles. I'm indexing all of these blog posts. No, really. 

Whatever else may be said about the acquisition by Yahoo! there are two 
things that pretty much everyone agrees were good for Flickr: 

Vespa http://www.flickr.com/photos/laloyd/1410021842/ (the magic 

Scandawegian document index that we 

("we http://www.aaronland.info/weblog/2009/ll/06/thatisall/#flickr ") 

use for search) and Where on Earth http://www.whereonearth.com/ (or WOE, 

since renamed GeoPlanet http://developer.yahoo.com/geo/geoplanet/ but I 

can never remember to call it that and the Y!Geo people always get cross with 
me...) which we used for all things geo-related. The nice thing about both is that 
they leant themselves to being used for all kinds of stuff that they weren't capital-D 
designed for. 



Vespa made it possible to do proper search as we grew and grew and grew 
(and at a time when we were federating the databases) but it also let us add radial 
queries, support for machine 

tags http://www.flickr.com/groups/api/discuss/72157594497877875/ — 

the machine tag 

extras http://code.flickr.com/blog/2 009/07/06/extraextraextra/ and 



hierarchies http:/ /code. flickr.com/blog/2 008/ 12/ 15 /machine-tag- 
hierarchies/ stuff is still done using MySQL, for what it's worth — and the 
ability to roll up (or facet) all that data to show the top tags for a 

place http: //www. flickr.com/services/api/flickr.places .tagsForPlace.htm 

or build API applications like Flickr For Busy 

People http://fiickrforbusypeopie.appspot.com/ and [ redacted ]. Vespa 
is a non-trivial piece of infrastructure to maintain and it's not a magic pony but I 
sure do miss having it to play with. 

WOE made it possible to let users geocode their photos globally instead of 
being limited to a US/Euro -centric subset largely determined by marketing and 
target demographics, to reverse-geocode (with a little help and coaxing along the 
way) machine-readable locations back in to human readable names and to use a 
unique set of identifiers for places (WOE IDs) that can be shared, in a hawt linked- 
data kind of way, with the rest of the Internets. 

Both of these are magic shiney boxes and very much proprietary. Yahoo! did 
more that most by deciding to release all of the WOE IDs and names under a 
Creative Commons license and we tried to do our part by also releasing the 
shapefiles for places (neighbourhoods, localities, etc.) derived from geotagged 
photos http://boundaries.tomtayior.co.uk/ , but there's nothing wrong with 
a business wanting to hold on to their secret sauce. 






22. 

23. 

24. 
25. 
2S. 
27. 

28. 
29. 
30. 

31. 
32. 

33. 






[fuzzy][14] fetch place type 22 w/radius - count 2 

[fuzzy][14] including airport London Heathrow Airport in result set 

[fuzzy test][14] begin with place type 22 

[fuzzy test][14][22877] is point in fuzzy bbox for Heathrow, with fuzziness of 2 

[fuzzy test][1 4] [22877] Heathrow (of type 22) is a possibility 

[fuzzy lest][14][29884] is point in fuzzy bbox for New Bedfont, with fuzziness of 

2 

[fuzzy test][14][29884] New Bedfont (of type 22) is a possibility 

14 is not a neighborounood 

[fuzzy test][14][23382429] London Heathrow Airport (of type 1 4, treated as 22) 

is a possibility 

[fuzzy test][14] test 3 possibilities for place type 22 

[fuzzy test][14][22877] Heathrow - contained by an airport, that's fucking 

stupid 

[fuzzy lest][14][29884] New Bedfont - contained by an airport, that's fucking 

stupid 



Still, I miss them. 

So I finally got around to looking at 
Solr http://iucene.apache.org/soir/ , a web-based wrapper for Doug 
Cuttings' Lucene document 

indexer http://www.iucidimagination.com/biog/2009/12/24/the-apache- 
iucene-ecosystem-my-view-of-2009/ . I don't know enough about search thingies 
to offer any kind of educated opinion about the differences between Vespa and Solr 
under the hood but conceptually Solr does most of things that I'd come to take for 
granted at Flickr including a brain-dead easy HTTP interface for searching and 
updating the index. I don't know that Solr would ever be able to index as fast as 
Vespa, at least not in the kind of environment we were running. There are always 
tricks and 

Solutions http : / / j eremy . zawodny . com/blog/archives / 1 1 5 02 . html but then 

again it's probably not something most people need to worry about. At least not to 
get started. 

Facetting. Solr does it out of the 

box. http: //www. lucidimagination.com/Community/Hear-from-the- 
Experts /Artie les/Faceted-Search-Solr 

Here are all the places in the GeoPlanet 

database http://developer.yahoo.com/geo/geoplanet/data/ with the word 
"Museum" in their name(s) facetted by country: 



# /select?q=museum&f acet=true& facet . f ield=iso&f acet . mincount=l&rows=0&wt=json 



facet counts 


■■{ 


"facet queries": 


"facet fie 
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no 


',30, 
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ee 
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•dk 
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il 
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',17, 




'lv 
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'It 
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cy 
"fr" 
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/2, 
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/If 
/If 
A, 

A, 
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/If 
A, 
/I/ 
/I/ 



Which looks pretty much like it could be generated by a vanilla GROUP BY 
statement in a relational database. And it could. The point is not to join the chorus 
of voices heralding the end of SQL but rather to get excited about some of the 
functionality of relational databases, which frankly aren't that good at search, being 
added to ... search engines. 

You can facet on any field that's been indexed which means that it's not 
without a cost but so far my experience has been that for an initial exploration of a 

dataset http : / /berglondon . com/blog/2 09 / 1 /2 3 /toiling-in-the-data- 
mines-what-data-exploration-f eels-like/ , at least, it's often easier and 
cheaper to update a Solr schema and refeed all the data (even from a relational 
database) than it is to add multiple (and always with the LEFTMOST- 

ineSS http://www.aaronland.info/weblog/2007/08/24/aware/#mtdb ) indexes 
to the same database. 



The other thing it can do is facet 

dates http: / /www. packtpub. com/ article /faceting- in- solr- 1 . 4 -enter prise- 
search-server . That's pretty exciting if for no other reason than so-called "linear 
searches" (so-called because I can't think of a better name) are boring and kind of 
pointless. Searches results ranked by a single criteria are still useful but not for 



everything and certainly not for a lot of stuff that the Internets have been 
responsible for. It's not that facetting by date (there were 100 photos of kittens 
uploaded yesterday, but only five the day before and 300 the month before that) is 
the one-true solution to all our relevancy woes but it's a really useful way to help 
people glean some basic understanding from a giant pile of possibilities. 




And geo. 



There's a lot of work being done to add spatial searching to the next major 
release (1.5) of Solr http://issues.apache.org/jira/browse/soLR-773 but in 
the meantime the nice folks at JTeam in the Netherlands simply read the thread on 
adding geo support on the Solr bug 

tracker http: //biog. jteam.nl/2009/O8/03/geo-iocation-search-with-soir- 
and-iucene/ and implemented the basics (radial queries) as Solr 
plugin http://issues.apache.org/jira/browse/soLR-773 . There's a little bit 
of fiddling that needs to happen in the Solr config files to get it working but 
otherwise it's about as easy as dropping a . j ar file in specific 

folder http: //github. com/straup/solr-geoplanet/tree/master/lib/ 



Here are all the places with "garden" in their name within 10 kilometers of 
London's Kensington Garden (which curiously isn't part of GeoPlanet): 



# Hey look! See how this example says "long" and the docs on the JTeam 

# site say "lng". That's because the docs are wrong — use "long". 

# Note the use of the fl parameter for the sake of brevity. 

# /select?q={ ! spatial lat=51 . 500152 long=-0. 126236 radius=10 calc=arc unit=km}gardenfi.f l=woeid,name, 

<result name= " response " numFound= " 3 " start=" " > 
<doc> 

<str name="name">Covent Garden</str> 

<int name="woeidt7043 http://www.flickr.com/places/l7043 </int> 
<double name=" longitude ">-0. 12 41 7908407 4</double> 
<double name=" latitude ">51 . 5143549184</double> 



</doc> 
<doc> 



</doc> 
<doc> 



<str name="name">Hatton Garden</str> 

<int name="woeidZtM94322 http://www.flickr.com/places/20Q94322 </int> 

<double name=" longitude ">-0. 109 1892 652 2 6</double> 

<double name=" latitude ">51 . 5180002638</double> 



<str name="name">New Covent Garden</str> 

<int name="woeid2'009433 8 http://www.flickr.com/places/20094338 </int> 

<double name=" longitude ">-0 . 1436715137 16</double> 

<double name=" latitude ">51 . 4 805 0553 4 7</double> 



</doc> 
</result> 



At Flickr, we built 
"Nearby http://code.fiickr.com/biog/2009/02/09/things-im-standing- 
next-to/ " using a similar approach in Vespa. 

This isn't going to replace 
PostGIS http://mojodna.net/2009/12/05/the-os-x-spatial-stack.html any 
time soon but it will allow you to filter your queries by a specific radius, using a 
plain old GET request, which is a Good Thing ™ . There's a lesson in that which is: 
Even when you actually figure out how to use 

PostGIS http://mojodna.net/2009/12/05/the-os-x-spatial-stack.html all 
the weird setup hoops and syntax quirks required to do anything make it feel like 
you might as well be stabbing yourself in the face. It is a very good spatial database 
but it's also not very much fun. There's also the part where Solr has 

replication http://wiki.apache.org/solr/SolrReplication and Postgres 
doesn't http: //wiki. postgresql.org/wiki /Replication, Cluster ing,_and_Com 

So, the obvious candidate for the Solr-love would be a local copy of all your 

Flickr data, exported through the API http://www.flickr.com/services/api . 

Maybe an extension to 

Net::Flickr: :Backup http : / /search . cpan . org/dist/Net-Flickr-Backup/ that 



writes to Solr at the same time. I've actually started to do that but then decided to try 
something a bit simpler to get my feet wet: Why not import all of the GeoPlanet 
data set in to Solr? It's not immediately obvious why you'd want to use Solr instead 
of a plain old database but I figured that the ability to facet the parent and child 
relations and adjacencies alongside all the usual search-y bits for place names made 
it worth at least trying. 

The data itself is exported as tab-separated files, with aliases (alternate 
names) and adjacencies for every WOE ID stored in separate files. This is fine 
except for the part where reading those files in to memory and feeding the Solr 
index, at the same time, is an outrageous memory hog. Instead I opted for exporting 
the aliases and adjacencies to a SQLite http://www.sqiite.org/ database. The 
actual list of places is read, line by line, using a simple Python script that pokes the 
temporary database per WOE ID. 

Then all you need to do is this: 

python import. py http://github.com/straup/solr-geoplanet/blob/master/bin/import.py — solr http : //localhost : 8< 
— data /path/to/geoplanet-7.4 .0 \ 
— version 7.4.0 

It takes about an hour to index the entire dataset on my laptop. Here's what 
London (WOE ID 44418) looks like after it's been indexed: 



<arr name="adjacent_woeid"> 

<int>18074</int> 

<int>19919</int> 

<int>1955K/int> 

<int>14482</int> 

<! — and so on — > 
</arr> 

<arr name="alias_CHI_Q"><str>f£!£</str></arr> 
<arr name="alias_CHI_V"><str>lFJi!£</str></arr> 
<arr name="alias_DUT__Q"><str>Londen</str></arr> 
<arr name="alias_ENG_V"><str>LON</strx/arr> 
<arr name="alias_FIN_Q"><str>Lontoo</str></arr> 
<arr name="alias_FIN_V"><str>Lontoon kautta</str></arr> 
<arr name="alias_FRE_Q"><str>Londres</str></arr> 
<arr name="alias_ITA_Q"><str>Londra</str></arr> 
<arr name="alias_JPN_Q"><str>P> K></strx/arr> 
<arr name="alias_KOR_Q"><str>?JEi</str></arr> 
<arr name="alias_POR_Q"><str>Londres</str></arr> 
<arr name="alias_SPA_Q"><str>Londres</str></arr> 
<str name="iso">GB</str> 
<str name="lang">ENG</str> 
<str name="name">London</str> 
<int name="parent_woeid">23416974</int> 
<str name="placetype">Town</str> 
<int name="woeid">44418</int> 



See all those alias_ fields? Every language+type pair (the FRE_Q part of 
the field name) is stored, but not indexed. All of the aliases, along with the default 
name, are also copied to a single dynamic 

field http: //wiki. apache. org/solr/SchemaXml#Dynamic_f ields called 
names and the bOOSt http://wiki.apache.org/solr/SolrRelevancyCookbook 

for each value is set according to the alias type. Anything ending in _V is assigned a 
score of 0.5; anything ending in _N a score of 2.0; everything else is left unchanged 
(the default boost value being 1 .0). The types aren't officially documented anywhere 
on the GeoPlanet site but this is what I've been able to learn about them so far: 

• N is a preferred local name 

• P is a preferred English name 

• Q is a preferred name (in other languages) 

• V is a valid variant name that is unpreferred 

• S — dunno 

• A — dunno 

The idea isn't that you search for a particular alias by trying to divine what 
language and type it's been assigned but rather that you just query the names field, 
which contains all the possibilities, and sort out the language stuff on your own. 
Like this: 



# Don't laugh, there's a town called poo http://www.flickr.com/places/770487 in Spain. 
# 

# /select?q=name:poo+OR+names :Londra&f l=name,woeid, aliasITAQ 

<result name=" response" numFound= " 4 " start="0"> 
<doc> 

<str name="name">Poo</str> 

<int name="woeid">770487</int> 
</doc> 
<doc> 

<str name="name">Poo</str> 

<int name="woeid">770486</int> 
</doc> 
<doc> 

<str name="name">Fernando Poo Islote</str> 

<int name="woeid">12465615</int> 
</doc> 
<doc> 

<str name="name">London</str> 

<int name="woeid">44418</int> 



<arr name="alias_ITA_Q"> 

<str>Londra</str> 
</arr> 
</doc> 
</result> 



This is not_ a geocoder. The Solr community is talking about adding 
geocoding support to version 1 .5 but until then if you need a geocoder take a look at 
the work that Schuyler and FortiusOne have been 

doing http://github.com/geocommons/geocoder . There might be some ways to 
make Solr play a geocoder on TV between now and version 1 .5 but it's definitely 
not going to work using anything I've done to date. In the short-term it would 
probably be worth generating, and indexing, a fully qualified name (city, state, 
country; that sort of thing) for each place because if you search for "Montreal 
Quebec" the only match is WOE ID 26332791 (Universite du Quebec a Montreal 
TJQAM) and not WOE ID 3534 (the city of Montreal). Baby steps. 

Another thing that would probably be useful is storing the complete list of 
ancestors for each WOE ID. One way to do that would be to add them as machine 
tags. Something like a multivalue field that contained woe : PLACETYPE=WOEID 
strings that could also be used to populate (using Solr's copyField 

magic http://www.ibm.com/developerworks/java/library/j-solrl/ J 

separate namespace, predicate and value fields (more on that below). If the order of 
stuff added to a multi-value field is always preserved that might work but I haven't 
done enough poking to say for sure. 

Version 7.4.0 of the GeoPlanet dataset shipped with a list of WOE IDs that, 
for one reason or another, have been deprecated and replaced by a newer WOE ID. 
The import . py script will check to see whether the data has been added to your 
SQLite database. If it has then for each WOE ID (in the places.tsv file) an additional 
lookup will be done to identify and store any WOE IDs it supercedes (in a multi- 
value field called surprising enough supercedes_woeid). Every ID that's been 
superceded will also be updated to reflect the WOE ID that replaces it. If no record 
for the WOE ID that's being superceded then a stub record is created that contains 
two fields: woeid , supercededby_woeid. That way, if someone passes you 
a WOE ID that's been deprecated you can either return the pointer immediately or 
include the provenance (backwards or forwards) for the data you do return. 




That's all great except for the part where the GeoPlanet data doesn't have any 
coordinate data associate with it. Of course there are ways to get it 
anyway http: //where. yahooapis.com/vl/place/ [INSERT_WOEID_HERE]?appid= 
[ insertyahooappidhere ] but I didn't mention that, did I? Instead, I added 
some code to read in the Flickr Shapefiles (Public 

Dataset) http://code.fiickr.com/biog/2009/05/21/fiickr-shapefiies- 
pubiic-dataset-10/ . So far, only the most recent (not a donut 

hole http://code.flickr.com/blog/2009/01/12/living-in-the-donut- 

hoie/ ) shapefile is used to calculate the centroid for a WOE ID. The bounding 
box is also stored as a set of individual fields: sw_latitude , 
sw_longitude , ne_latitude , ne_longitude. I'm not sure that's the 
right thing to do, long-term. There's an interesting paper about using Hadoop to do 

GIS processing http://www.nathankerr.com/projects/parallel-gis- 

processing/gisonhadoop . html where coordinate data is stored using the Well 
Known Text http://en.wikipedia.org/wiki/Well-known_text serialization 
which might be worth trying. Eventually, it was all starting to smell like yaks so I 
opted for the simplest thing you could actually query today (I am not quite ready to 
cross the river and start writing my own query plugins in 

Java http://www.aaronland.info/weblog/2 08/02/05/fox/#ws-decode just 

yet) and moved on. 

Once you've got a copy of the shape 

data http://code.flickr.com/blog/2009/05/21/flickr-shapefiles-public- 



dataset- 10/ , importing it looks like this: 

python import_f lickr_shapef iles.py http://github.com/straup/solr-geoplanet/blob/master/bin/import_flickr_shapefi] 
— f lickr /path /to/ flickrshape file s_public_dataset_l .0.1 .xml 

There are about 5.3 million WOE IDs in the GeoPlanet dataset and only 
about 175, 000 shapefiles (some of which are ignored because they make the library 
used to parse them cry) which is at least a start. Coverage will be good in major 
European and North American cities and probably a bit weird everywhere else 
which is the nature of the beast http : / /www . ayman- 

naaman.net/2009/12/24/milgram-tagmaps-lynch-alphashape/ . There are lots 
of other datasets out there and the only thing that's needed is a way to determine a 

matching WOE ID http://developer.yahoo.com/geo/placemaker/ . 

I've put everything up on Github: 

http://github.com/straup/solr-geoplanet 

This includes the various config files that Solr uses and the import scripts 
described here, along with a modified version of 

pysolr http://code.googie.eom/p/pysoir/ (to support boost values when 
calling the add method) because I haven't gotten around to submitting a patch. 

There's a README http://github.com/straup/solr- 

geopianet/biob/master/README file that walks through the basics of getting 
everything set up but assumes that you've already spent a little bit of time reading 
the Solr documentation http://lucene.apache.org/solr/tutorial.html . 
One of the nicest things about getting to know Solr is that the community has done a 
fabulous job documenting it http://wiki.apache.org/soir/ . I've got a bunch 
of bookmarks on del.icio.us http: //delicious .com/straup/soir and the Solr 

book http://www.packtpub.com/solr-l-4-enterprise-search-server is 

actually really useful. 

Everything uses the standard "example " application that ships with Solr, 
including the admin interfaces for adding and removing documents. Unless you 
actually know how to configure Java web server thingies you should not_ deploy this 
anywhere within sight of the public. (The idea is usually that you run Solr on an 



internal port and have your application talk to it locally over HTTP and fuss with 
the results http://wiki.apache.org/soir/SoUSON before handing them back 
to users.) Tangentially related is the part where Jetty, the Java web server thingy 
used by Solr, just grew support for 

WebSocketS http: //blogs .webtide.com/gregw/entry/jetty_websocket_server 

which might prove to be fun and exciting. In the meantime I will just hope for 
friendlier Jetty documentation... 




Finally: 



I mentioned machine tags before. They were the catalyst for nosing around 
with Solr. I am doing a workshop on machine tags at Museums and the 

Web http://www.archimuse.com/mw2010/abstracts/prg_335002366.html , 
next year, and the Department of Talk is Cheap has mandated that I be able to 
demonstrate a functional machine tag store that people can use without having to 
work at Flickr. I've got the shape of it working including the ability to (finally) do 
range queries as part of the solr-flickr http://github.com/straup/soir- 
f lickr project, which is also on Github. 



I mention it because the whole tag/machine tag thing seems like a good stack 
of data to squirt into a Solr-enabled GeoPlanet. There are plenty of places to pull 
that data from on Flickr: Doing a straight-up search for a WOE ID; Indexing the top 

tags http: //www. flickr.com/services/api/flickr.places .tagsForPlace.html 

(and 

Clusters http: //www.f lickr.com/services/api/flickr .tags.getClusters.html 

for a WOE ID (that's how most of the I See 

DOTS http://www.flickr.com/photos/straup/sets/72157 622 8832 63 6 98/ 

images were created); checking for machine tags that are known to have geo 

data http://code.flickr.com/blog/2009/10/19/small-bridges-to- 

proximate-spaces/ and then fetching second and third 

Order http: //www. slideshare.net/mattb/mobile-social-location/ 2 9 tags 

for photos added with "recent 

values http: //www. flickr.com/services/api/flickr.machinetags . getRecentVa 

(for a given machine tag). The list goes on and indexing and sorting all those little 
bits of text is precisely the kind of thing that Solr is good at. Maybe we can build 
the alpha Shape http://code.flickr.com/blog/2 008/10/30/the-shape-of- 
aipha/ equivalent of a geocoder, whatever that means. 

In the end it might be easiest to feed all of that data to Solr using a bunch of 
(languages that start with P ) scripts but Solr does support the notion of data 
import handlers http: //wiki. apache.org/solr/DatalmportHandler . This 

includes stuff like database connections or any old URL that can retrieved from 
the 

Internets http : //wiki . apache . org/ solr /DataImportHandler#Usage_with_XML . 2E 
HTTPDatasource and parsed with XPath http://www.w3.org/TR/xpath . 
Since most of the stuff I've described above doesn't need to be authenticated it 
should be as easy as building the query URI and assigning it as the 

"HttpDataSource http : / /wiki . apache . org/ solr /DataImportHandler#Conf igura 

This approach relies on the endpoint you're calling have default date limits (say, the 
last n hours) which the Flickr API does in most cases. All you need to do is set up a 
cron job to call the delta- 
import http: / /wiki. apache, org/ solr /DataImportHandler#Us ingdelta- 

import command command (read: HTTP GET) at set intervals and your GeoPlanet 
dataset will be updated automagically, modulo any thoughts about how to deal with 
multiple instances of the same tag or how to calculate boost values for terms. 




I'm just saying. 
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